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Description 

FIELD OF THE INVENTION 

5 [0001] The invention applies the technical field of molecular genetics to evolve the genomes of cells and organisms 
to acquire new and improved properties. 

BACKGROUND 

10 [0002] WO 98/31 837 {PCT/US98/00852) provides pioneering technology for evolving the genome of whole cells and 
organisms. One of skill will appreciate that the technology provided in WQ 98/31837 is fundamental to the ability of one 
of skill rapidly to evolve cells and whole organisms. For example, the document teaches a variety of recursive methods 
of artificially recombining nucleic acids in vivo, including entire genomes, and ways of selecting resulting recombinant 
organisms. 

15 [0003] This ability to evolve genes artifically is of fundamental importance. For example, cells have a number of well- 
established uses in molecular biology, medicine and industrial processes. For example, cells are commonly used as 
hosts for manipulating DNA in processes such as transformation and recombination. Cells are used for expression of 
recombinant proteins encoded by DNA transformed/transfected or otherwise introduced into the cells. Some types of 
cells are used as progenitors for generation of transgenic animals and plants. Although all of these processes are now 

20 routine, prior to the technology provided by WO 98/31 837, the genomes of the cells used in these processes had evolved 
little from the genomes of natural cells, and particularly not toward acquisition of new or improved properties for use in 
the above processes. 

[0004] Additional methods of recursively recombining nucleic acids in vivo and selecting resulting recombinants would 
be of use. The present invention provides a number of new and valuable methods and compositions for whole and partial 
25 genome evolution. 

SUMMARY OF THE INVENTION 

[0005] In one aspect, the invention provides methods of evolving a cell to acquire a desired function. Such methods 

30 entail, e.g., introducing a library of DNA fragments into a pluriality of cells, whereby at least one of the fragments undergoes 
recombination with a segment in the genome or an episome of the cells to produce modified cells. Optionally, these 
modified cells are bred to increase the diversity of the resulting recombined cellular population. The modified cells, or 
the recombined cellular population are then screened for modified or recombined cells that have evolved toward acqui- 
sition of the desired function. DNA from the modified cells that have evolved toward the desired function is then optionally 

35 recombined with a further library of DNA fragments, at least one of which undergoes recombination with a segment in 
the genome or the episome of the modified cells to produce further modified cells. The further modified celts are then 
screened for further modified cells that have further evolved toward acquisition of the desired function. Steps of recom- 
bination and screening/selection are repeated as required until the further modified cells have acquired the desired 
function. In one preferred embodiment, modified cells are recursively recombined to increase diversity of the cells prior 

"^0 to performing any selection steps on any resulting cells. 

[0006] In some methods, the library or further library of DNA fragments is coated with recA protein to stimulate re- 
combination with the segment of the genome. The library of fragments is optionally denatured to produce single-stranded 
DNA, which are annealed to produce duplexes, some of which contain mismatches at points of variation in the fragments. 
Duplexes containing mismatches are optionally selected by affinity chromatography to immobilized MutS. 

45 [0007] Optionally, the desired function is secretion of a protein, and the plurality of celts further comprises a construct 
encoding the protein. The protein is optionally inactive unless secreted, and further modified cells are optionally selected 
for protein function. Optionally, the protein is toxic to the plurality of cells, unless secreted. In this case, the modified or 
further modified cells which evolve toward acquisition of the desired function are screened by propagating the cells and 
recovering surviving cells. 

50 [0008] In some methods, the desired function is enhanced recombination. In such methods, the library of fragments 
sometimes comprises a cluster of genes collectively conferring recombination capacity. Screening can be achieved 
using cells carrying a gene encoding a marker whose expression is prevented by a mutation removable by recombination. 
The cells are screened by their expression of the marker resulting from removal of the mutation by recombination. 
[0009] In some methods, the plurality of cells are plant cells and the desired property is improved resistance to a 

55 chemical or microbe. The modified or further modified cells (or whole plants) are exposed to the chemical or microbe 
and modified or further modified cells having evolved toward the acquisition of the desired function are selected by their 
capacity to survive the exposure. 

[0010] In some methods, the plurality of cells are embryonic cells of an animal, and the method further comprises 
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propagating the transformed cells to transgenic animals. 

[001 1] The plurality of cells can be a plurality of industrial microorganisms that are enriched for microorganisms which 
are tolerant to desired process conditions (heat, light, radiation, selected pH, presence of detergents or other denaturants, 
presence of alcohols or other organic molecules, etc.). 
5 [001 2] The invention further provides methods for performing in vivo recombination. At least first and second segments 
from at least one gene are introduced into a cell, the segments differing from each other in at least two nucleotides, 
whereby the segments recombine to produce a library of chimeric genes. A chimeric gene is selected from the library 
having acquired a desired function. 

[001 3] The invention further provides methods of predicting efficacy of a drug in treating a viral infection. Such methods 

10 entail recombining a nucleic acid segment from a virus, whose infection is inhibited by a drug, with at least a second 
nucleic acid segment from the virus, the second nucleic acid segment differing from the first nucleic acid segment in at 
least two nucleotides, to produce a library of recombinant nucleic acid segments. Host cells are then contacted with a 
collection of viruses having genomes including the recombinant nucleic acid segments in a media containing the drug, 
and progeny viruses resulting from infection of the host cells are collected. 

15 [0014] A recombinant DNA segment from a first progeny virus recombines with at least a recombinant DNA segment 
from a second progeny virus to produce a further library of recombinant nucleic acid segments. Host cells are contacted 
with a collection of viruses having genomes including the further library or recombinant nucleic acid segments, in media 
containing the drug, and further progeny viruses are produced by the host cells. The recombination and selection steps 
are repeated, as desired, until a further progeny virus has acquired a desired degree of resistance to the drug, whereby 

20 the degree of resistance acquired and the number of repetitions needed to acquire it provide a measure of the efficacy 
of the drug in treating the virus. Viruses are optionally adapted to grow on particular cell lines. 
[0015] The invention further provides methods of predicting efficacy of a drug in treating an infection by a pathogenic 
microorganism. These methods entail delivering a library of DNA fragments into a plurality of microorganism cells, at 
least some of which undergo recombination with segments in the genome of the cells to produce modified microorganism 

25 cells. Modified microorganisms are propagated in a media containing the drug, and surviving microorganisms are re- 
covered. DNA from surviving microorganisms is recombined with a further library of DNA fragments at least some of 
which undergo recombination with cognate segments in the DNA from the surviving microorganisms to produce further 
modified microorganisms cells. Further modified microorganisms are propagated in media containing the drug, and 
further surviving microorganisms are collected. The recombination and selection steps are repeated as needed, until a 

30 further surviving microorganism has acquired a desired degree of resistance to the drug. The degree of resistance 
acquired and the number of repetitions needed to acquire it provide a measure of the efficacy of the drug in killing the 
pathogenic microorganism. 

[0016] The invention further provides methods of evolving a cell to acquire a desired function. These methods entail 
providing a populating of different cells. The cells are cultured under conditions whereby DNA is exchanged between 
35 celts, forming cells with hybrid genomes. The cells are then screened or selected for cells that have evolved toward 
acquisition of a desired property. The DNA exchange and screening/selecting steps are repeated, as needed, with the 
screened/selected cells from one cycle forming the population of different cells in the next cycle, until a cell has acquired 
the desired property. 

[001 7] Mechanisms of DNA exchange include conjugation, phage-mediated transduction, liposome delivery, protoplast 
"^0 fusion, and sexual recombination of the cells. Optionally, a library of DNA fragments can be transformed or electroporated 
into the cells. . 

[0018] As noted, some methods of evolving a cell to acquire a desired property are effected by protoplast-mediated 
exchange of DNA between cells. Such methods entail forming protoplasts of a population of different cells. The protoplasts 
are then fused to form hybrid protoplasts, in which genomes from the protoplasts recombine to form hybrid genomes. 
^5 The hybrid protoplasts are incubated under conditions promoting regeneration of cells. The regenerated cells can be 
recombined one or more times (i.e., via protoplasting or any other method than combines genomes of cells) to increase 
the diversity of any resulting cells. Preferably, regenerated cells are recombined several times, e.g., by protoplast fusion 
to generate a diverse population of cells. 

[0019] The next step is to select or screen to isolate regenerated cells that have evolved toward acquisition of the 
50 desired property. DNA exchange and selection/screening steps are repeated, as needed, with regenerated cells in one 
cycle being used to form protoplasts in the next cycle until the regenerated cells have acquired the desired property. 
Industrial microorganisms are a preferred class of organisms for conducting the above methods. Some methods further 
comprise a step of selecting or screening for fused protoplasts free from unfused protoplasts of parental cells. Some 
methods further comprise a step of selecting or screening for fused protoplasts with hybrid genomes free from cells with 
55 parental genomes. In some methods, protoplasts are provided by treating individual cells, mycelia or spores with an 
enzyme that degrades cell walls. In some methods, the strain is a mutant that is lacking capacity for intact cell wall 
synthesis, and protoplasts form spontaneously. In some methods, protoplasts are formed by treating growing cells with 
an inhibitor of cell wall formation to generate protoplasts. 
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[0020] In some methods, the desired property is expression and/or secretion of a protein or secondary metabolite, 
such as an industrial enzyme, a therapeutic protein, a primary metabolite such as lactic acid or ethanol, or a secondary 
metabolite such as erythromycin cyclosporin A or taxol. In other methods it is the ability of the cell to convert compounds 
provided to the cell to different compounds. In yet other methods, the desired property is capacity for meiosis. In some 

5 methods, the desired property is compatibility to form a heterokaryon with another strain. 

[0021] The invention further provides methods of evolving a cell toward acquisition of a desired property. These 
methods entail providing a population of different cells. DNA is isolated from a first subpopulation of the different cells 
and encapsulated in liposomes. Protoplasts are formed from a second subpopulation of the different cells. Liposomes 
are fused with the protoplasts, whereby DNA from the liposomes is taken up by the protoplasts and recombtnes with 

10 the genomes of the protoplasts. The protoplasts are incubated under regenerating conditions. Regenerating or regen- 
erated cells are then selected or screened for evolution toward the desired property. 

[0022] The invention further provides methods of evolving a cell toward acquisition of a desired property using artificial 
chromosomes. Such methods entail introducing a DNA fragment library cloned into an artificial chromosome into a 
population of cells. The cells are then cultured under conditions whereby sexual recombination occurs between the cells, 

f5 and DNA fragments cloned into the artificial chromosome recombines by homologous recombination with corresponding 
segments of endogenous chromosomes of the populations of cells, and endogenous chromosomes recombine with 
each other. Cells can also be recombined via conjugation: Any resulting cells can be recombined via any method noted 
herein, as many times as desired, to generate a desired level of diversity in the resulting recombinant cells. In any case, 
after generating a diverse library of cells, the cells that have evolved toward acquisition of the desired property are 

20 screened and/or selected for a desired property. The method is then repeated with cells that have evolved toward the 
desired property in one cycle forming the population of different cells in the next cycle. Here again, multiple cycles of in 
vivo recombination are optionally performed prior to any additional selection or screening steps. 
[0023] The invention further provides methods of evolving a DNA segment cloned into an artificial chromosome for 
acquisition of a desired property. These methods entail providing a library of variants of the segment, each variant cloned 

25 into separate copies of an artificial chromosome. The copies of the artificial chromosome are introduced into a population 
of cells. The cells are cultured under conditions whereby sexual recombination occurs between cells and homologous 
recombination occurs between copies of the artificial chromosome bearing the variants. Variants are then screened or 
selected for evolution toward acquisition of the desired property. 

[0024] The invention further provides hyperrecombinogenic recA proteins. Examples of such proteins are from clones 

30 2, 4, 5, 6 and 13 shown in Fig. 13. 

[0025] The method also provides methods of reiterative pooling and breeding of higher organisms. In the methods, a 
library of diverse multicellular organisms are produced (e.g., plants, animals or the like). A pool of male gametes is 
provided along with a pool of female gametes. At least one of the male pool or the female pool comprises a plurality of 
different gametes derived from different strains of a species or different species. The male gametes are used to fertilize 

35 the female gametes. At least a portion of the resulting fertilized gametes grow into reproductively viable organisms. 
These reproductively viable organisms are crossed (e.g., by pairwise pooling and joining of the male and female gametes 
as before) to produce a library of diverse organisms. The library is then selected for a desired trait or property. 
[0026] The library of diverse organisms can comprise a plurality of plants such as Gramineae, Fetucoideae, Poacoi- 
deae, Agrostis, Phleum, Dactylis, Sorgum, Setaria, Zea, Oryza, Triticum, Secale, Avena, Hordeum, Saccharum, Poa, 

"fo Festuca, Stenotaphrum, Cynodon, Coix, Olyreae, Phareae, Compositae or Leguminosae. For example, the plants can 
be e.g., corn, rice, wheat, rye, oats, barfey, pea, beans, lentil, peanut, yam bean, cowpeas, velvet beans, soybean, 
clover, alfalfa, lupine, vetch, lotus, sweet clover, wisteria, sweetpea, sorghum, millet, sunflower, canola or the like. 
[0027] Similariy, the library of diverse organisms can incldue a plurality of animals such as non-human mammals, fish, 
insects, or the like. 

45 [0028] Optionally, a plurality of selected library members can be crossed by pooling gametes from the selected mem- 
bers and repeatedly crossing any resulting additional reproductively viable organisms to produce a second library of 
diverse organisms (e.g., by split pairwise pooling and rejoining of the male and female gametes). Here again, the second 
library can be selected for a desired trait or property, with the resulting selected members forming the basis for additional 
poolwise breeding and selection. 

50 [0029] A feature of the invention is the libraries made by these (or any preceding) method. 

BRIEF DESCRIPTION OF THE DRAWING 
[0030] 

55 

Fig. 1, panels A-D: Scheme for />7 wYro shuffling of genes. 

Fig. 2: Scheme for enriching for mismatched sequences using MutS. 

Fig. 3: Alternative scheme for enriching for mismatched sequences using MutS. 
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Fig. 4: Scheme for evolving growth hormone genes to produce larger fish. 
Fig. 5: Scheme for shuffling prokaryotes by protoplast fusion. 

Fig, 6: Scheme for introducing a sexual cycle into fungi previously incapable of sexual reproduction. 
Fig. 7: General scheme for shuffling of fungi by protoplast fusion. 

Fig. 8: Shuffling fungi by protoplast fusion with protoplasts generated by use of inhibitors of enzymes responsible 
for cell wall formation. 

Fig. 9: Shuffling fungi by protoplast fusion using fungal strains deficient in cell-wall synthesis that spontaneously 
form protoplasts. 

Fig. 10: YAC-mediated whole genome shuffling of Saccharomyces cerevisiae and related organisms. 
Fig. 1 1 : YAC-mediated shuffling of large DNA fragments: 

Fig. 12: (A, B, C and D) DNA sequences of a wildtype recA protein and five hyperrecombinogenic variants thereof. 
Fig. 13: Amino acid sequences of a wtjdtype recA protein and five hyperrecombinogenic variants thereof. 
Fig. 14: illustration of combinatoriality. 

Fig. 15: Repeated pairwise recombination to access multi-mutant progeny. 

Fig. 16: graph of fitness versus sequence space for three different mutation strategies. 

Fig. 17: graphs of asexual sequential mutagenesis and sexual recursive recombination. 

Fig. 18: Schematic for non-homologous recombination. 

Fig. 19: Schematic for split and pool strategy. 

Fig. 20. panel A: schematic for selectable/ counterselectable rharker strategy. 

Fig. 20, panel B: schematic for selectable/ counterselectable marker strategy for Rec A. 

Fig. 21 : plant regeneration strategy for regenerating salt-tolerant plants. 

Fig. 22: Whole genome shuffling of parsed (subcloned) genomes. 

Fig. 23: Schematic for blind cloning of gene homologs. 

Fig. 24: High throughput family shuffling. 

Fig. 25: Schematic and graph of poolwise recombination. 

Fig. 26: Schematic of protoplast fusion. 

Fig. 27: Schematic assay for poolwise recombination. 

Fig. 28: Schematic of halo assay and integrated system. 

Fig. 29: Schematic drawing illustrating recursive pooled breeding of fish. 

Fig. 30: Schematic drawing illustrating recursive pooled breeding of plants. 

Fig. 31 : Schematic for shuffling of S. Colicolor. 

Fig. 32: schematic drawing illustrating HTP actinorohodin assay. 

Fig. 33; schematic drawing and table illustrating whole genome shuffling of four parental strains. 
Fig. 34: schematic drawing of WGS through organized heteroduplex shuffling. 

DETAILED DESCRIPTION 

I. GENERAL 

A. THE BASIC APPROACH 

[0031] The invention provides methods for artificially evolving cells to acquire a new or improved property by recursive 
sequence recombination. Briefly, recursive sequence recombination entails successive cycles of recombination to gen- 
erate molecular diversity and screening/selection to take advantage of that molecular diversity. That is. a family of nucleic 
acid molecules is created showing substantial sequence and/or structural identity but differing as to the presence of 
mutations. These sequences are then recombined in any of the described formats so as to optimize the diversity of 
mutant combinations represented in the resulting recombined library. Typically, any resulting recombinant nucleic acids 
or genomes are recursively recombined for one or more cycles of recombination to increase the diversity of resulting 
products. After this recursive recoriibination procedure, the final resulting products are screened and/or selected for a 
desired trait or property. 

[0032] Alternatively, each recombination cycle can followed by at least one cycle of screening or selection for molecules 
having a desired characteristic. In this embodiment, the molecule(s) selected in one round form the starting materials 
for generating diversity in the next round. 

[0033] The cells to be evolved can be bactisria, archaebacteria, or eukaryotic cells and can constitute a homogeneous 
cell line or mixed culture! Suitable cells for evolutidn include the bacterial and eukaryotic cell lines commonly used in 
genetic engineering, protein expression, or the industrial production or conversion of proteins, enzymes, primary me- 
tabolites, secondary metabolitesi, fine, specialty or commodity chemicals. Suitable mammalian cells include those from, 
e.g., mouse, rat, hamster, primate, and human, both cell lines and primary cultures. Such cells include stem cells, 
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including embryonic stem cells and hemopoietic stem cells, zygotes, fibroblasts, lymphocytes, Chinese hamster ovary 
(CHO), mouse fibroblasts (NIH3T3), kidney, liver, muscle, and skin cells. Other eukaryolic cells of interest include plant 
cells, such as maize, rice, wheat, cotton, soybean, sugarcane, tobacco, and arabidopsis; fish, algae, fungi (penicUiium, 
aspergiilus, podospora, neurospora. saccharomyces), insect (e.g., baculo lepidoptera), yeast (picchia and saccharo- 
5 myces. Schizosaccharomyces pombe). Also of interest are many bacterial celt types, both gram-negative and gram- 
positive, such as Bacillus subtilis, B. licehniformis, B. cereus, Escherichia colt, Streptomyces, Pseudomonas, Salmo- 
nella, Actinomycetes, Lactobacillius, Acetonitcbacter, Deinococcus, and Eminia. The complete genome sequences of 
E. CO// and Bacillus subtilis are described by Blattner et al., Science 277, 1454-1462 (1997); Kunst et al.. Nature 390, 
249-256(1997)). 

10 [0034] Evolution commences by generating a population of variant cells. Typically, the cells in the population are of 
the same type but represent variants of a progenitor cell. In some instances, the variation is natural as when different 
cells are obtained from different individuals within a species, from different species or from different genera. In other 
instances, variation is induced by mutagenesis of a progenitor eel!. Mutagenesis can be effected by subjecting the cell 
to mutagenic agents, or if the cell is a mutator cell (e.g., has mutations in genes involved in DNA replication, recombination 

ts and/or repair which favor introduction of mutations) simply by propagating the mutator cells. Mutator cells can be gen- 
erated from successive selections for simple phenotypic changes (e.g., acquisition of rifampicin-resistance, then nalidixic 
acid resistance then lac- to lac+ (see Mao et al., J. Bacteriol. 179, 417-422 (1997)), or mutator cells can be generated 
by exposure to specific inhibitors of cellular factors that result in the mutator phenotype. These could be inhibitors of 
mutS, mu(L, mu^, redO, muN, muM, dam, uvtO and the like. 

20 [0035] More generally, mutations are induced in cell populations using any available mutation technique. Common 
mechanisms for inducing mutations include, but are not limited to, the use of strains comprising mutations such as those 
Involved in mismatch repair, e.g. mutations in mu/S, mu{\, muL. and muM\ exposure to UV light; Chemical mutagenesis, 
e.g. use of inhibitors of MMR, DNA damage inducible genes, or SOS inducers; overproduction/underproduction/ mutation 
of any component of the homologous recombination comptigx/pathway, e.g. RecA, ssb, etc.; overproduction/ underpro- 

25 duction/ mutation of genes involved in DNA synthesis/homepstasis; overproduction/ underproduction/ mutation of re- 
combination-stimulating genes from bacteria, phage (e.g. Lambda Red function), or other organisms; addition of chi 
sites into/flanking the donor DNA fragments; coating the DNA fragments with RecA/ssb and the like. 
[0036] In other instances, variation is the result of transferring a library of DNA fragments into the cells (e.g., by 
conjugation, protoplast fusion, liposome fusion, transformation, transduction or natural competence). At least one, and 

30 usually many of the fragments in the library, show some, but not complete, sequence or structural identity with a cognate 
or allelic gene within the cells sufficient to allow homologous recombination to occur. For example, in one embodiment, 
homologous integration of a plasmid carrying a shuffled gene or metabolic pathway leads to insertion of the plasmid- 
borne sequences adjacent to the genomic copy. Optionally, a counter-selectable marker strategy is used to select for 
recombinants in which recombination occurred between the homologous sequences, leading to elimination of the counter- 

35 selectable marker. This strategy is illustrated in Fig. 20A. A variety of selectable and counter selectable markers are 
amply illustrated in the art. For a list of useful markers, see, Berg and Berg (1996), Transposable element tools for 
microbial genetics Escherichia coii and Salmonella Neidhardt. Washington, D.C., ASM Press. 2: 2588-2612; La Rossa, 
ibid., 2527-2587. This strategy can be recursively repeated to maximize sequence diversity of targeted genes prior to 
screening/ selection for a desired trait or property. 

40 [0037] The library of fragments can derive from one or more sources. One source of fragments is a genomic library 
of fragments from a different species, cell type, organism or individual from the cells being transfected. In this situation, 
many of the fragments in the library have a cognate or allelic gene in the cells being transformed but differ from that 
gene due to the presence of naturally occurring species variation, polymorphisms, mutations, and the presence of 
multiple copies of some homologous genes in the genome. Alternatively, the library can be derived from DNA from the 

45 same cell type as is being transformed after that DNA has been subject to induced mutation, by conventional methods, 
such as radiation, error-prone PCR, growth in a mutator organism, trarisposon mutagenesis, or cassette mutagenesis. 
Alternatively, the library can derive from a genomic library of fragments generated from the pooled genomic DNA of a 
poputation of cells having the desired characteristics. Alternatively, the library can derive from a genomic library of 
fragments generated from the pooled genomic DNA of a population of cells having desired characteristics. 

50 [0038] In any of these situations, the genomic library can be a complete genomic library or subgenomic library deriving, 
for example, from a selected chromosome, or part of a chromosome or an episomal element within a cell. As well as, 
or instead of these sources of DNA fragments, the Iibrai7 can contain fragments representing natural or selected variants 
of selected genes of known function (i.e., focused libraries). 

[0039] The number of fragments in a library can vary from a single fragment to about 10^^, with libraries having from 
55 10^ to 10^ fragments being common. The fragments should be sufficiently long that they can undergo homologous 
recombination and sufficiently short that they can be introduced into a cell, and if necessary, manipulated before intro- 
duction. Fragment sizes can range from about 10 b to about 20mb. Fragments can be double- or single-stranded. 
[0040] The fragments can be introduced into cells as whole genomes or as components of viruses, plasmtds, YACS, 
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HACs or BACs or can be introduced as they are, in which case all or most of the fragments lack an origin of replication. 
Use of viral fragments with single-stranded genomes offer the advantage of delivering fragments in single stranded form, 
which promotes recombination. The fragments can also be joined to a selective marker before introduction. Inclusion of 
fragments in a vector having an origin of replication affords a longer period of time after introduction into the cell in which 

5 fragments can undergo recombination with a cognate gene before being degraded or selected against and lost from the 
cell, thereby increasing the proportion of cells with recombinant genomes. Optionally, the vector is a suicide vector 
capable of a longer existence than an isolated DNA fragment but not capable of permanent retention in the cell line. 
Such a vector can transiently express a marker for a sufficient time to screen for or select a cell bearing the vector (e.g., 
because cells transduced by the vector are the target cell type to be screened in subsequent selection assays), but is 

10 then degraded or othenwise rendered incapable of expressing the marker. The use of such vectors can be advantageous 
in performing optional subsequent rounds of recombination to be discussed below. For example, some suicide vectors 
express a long-lived toxin which is neutralized by a short-lived molecule expressed from the same vector. Expression 
of the toxin alone will not allow vector to be established. Jense & Gerdes, Mol: Microbiol, 17, 205-210 (1995); Bernard 
et al., Gene 162, 159-160. Alternatively, a vector can be rendered suicidal by incorporation of a defective origin of 

15 replication (e.g. a terriperature-sensitive origin of replication) or by omission of an origin of replication. Vectors can also 
be rendered suicidal by inclusion of negative selection markers, such as ura3 in yeast or sacB in many bacteria. These 
genes become toxic only in the presence of specific compounds. Such vectors can be selected to have a wide range of 
stabilities. A list of conditional replication defects for vectors which can be used, e.g., to render the vector replication 
defective is found, e.g., in Berg and Berg (1996), "Transposable element tools for microbial genetics" Escherichia coli 

20 and Salmonella Neidhardt, Washington, D.C., ASM Press. 2: 2588-2612. Similariy, a list of counterselectable markers, 
generally applicable to vector selection is also found in Berg and Berg, id. See also, LaRossa (1996) "Mutant selections 
linking physiology, inhibitors, and genotypes" Escherichia coli and Salmonella F. C. Neidhardt. Washington, D.C., ASM 
Press. 2: 2527-2587. 

[0041] After introduction into cells, the fragments can recombine with DNA present in the genome, or episomes of the 
25 cells by homologous, nonhomologous or site-specific recombination. For present purposes, homologous recombination 
makes the most significant contribution to evolution of the cells because this form of recombination amplifies the existing 
diversity between the DNA of the cells being transfected and the DNA fragments. For example, If a DNA fragment being 
transfected differs from a cognate or allelic gene at two positions, there are four possible recombination products, and 
each of these recombination products can be formed in different cells in the transformed population. Thus, homologous 
30 recombination of the fragment doubles the initial diversity in this gene. When many fragments recombine with corre- 
sponding cognate or allelic genes, the diversity of recombination products with respect to starting products increases 
exponentially with the number of mutations. Recombination results in modified cells having modified genomes and/or 
episomes. Recursive recombination prior to selection further increases diversity of resulting modified cells. 
[0042] The variant cells, whether the result of natural variation, mutagenesis, or recombination are screened or selected 
35 to identify a subset of cells that have evolved toward acquisition of a new or improved property. The nature of the screen, 
of course, depends on the property and several examples will be discussed below. Typically, recombination is repeated 
before initial screening. Optionally, however, the screening can also be repeated before performing subsequent cycles 
of recombination. Stringency can be increased in repeated cycles of screening. 

[0043] The subpopulation of cells surviving screening are optionally subjected to a further round of recombination. In 

^0 some instances, the further round of recombination is effected by propagating the cells under conditions allowing ex- 
change of DNA between cells. For example, protoplasts can be formed from the cells, allowed to fuse, and regenerated. 
Cells with recombinant genomes are propagated from the fused protoplasts. Alternatively, exchange of DNA can be 
promoted by propagation of cells or protoplasts in an electric field. For cells having a conjugative transfer apparatus, 
exchange of DNA can be promoted simply by propagating the cells. 

45 [0044] In other methods, the further round of recombination is performed by a split and pool approach. That is, the 
surviving cells are divided into two pools. DNA is isolated from one pool, and if necessary amplified, and then transformed 
into the other pool. Accordingly, DNA fragments from the first pool constitute a further library of fragments and recombine 
with cognate fragments in the second pool resulting in further diversity. An example of this strategy is illustrated in Fig. 
19. As shown, a pool of mutant bacteria with improvernents in a desired phenotype is obtained and split. Genes are 

50 obtained from one half, e.g., by PGR, by cloning of random genomic fragrhents, by infection with a transducing phage 
and harvesting transducing particles, or by the introduction of an origin of transfer (OriT) randomly into the relevant 
chromosome to create a donor population of cells capable of transferring random fragments by conjugation to an acceptor 
population. These genes are then shuffled (in vitro by known methods or in vivo as taught herein), or simply cloned into 
an allele replacement vector (e.g., one carrying selectable and counter-selectable markers). The gene pool is then 

55 transformed into the other half of the original mutant pool and recombinants are selected and screened for further 
improvements in phenotype. These best variants are used as the starting point for the next cycle. Alternatively, recursive 
recombination by any of the methods noted can be performed prior to screening, thereby increasing the diversity of the 
population of cells to be screened. 
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[0045] In other methods, some or all of the cells surviving screening are transfected with a fresh library of DNA 
fragments, which can be the same or different from the library used in the first round of recombination. In this situation, 
the genes In the fresh library undergo recombination with cognate genes in the surviving cells. If genes are introduced 
as components of a vector, compatibility of this vector with any vector used in a previous round of transfection should 
5 be considered. If the vector used in a previous round was a suicide vector, there is no problem of incompatibility. If, 
however, the vector used in a previous round was not a suicide vector, a vector having a different incompatibility origin 
should be used in the subsequent round. In all of these formats, further recombination generates additional diversity in 
the DNA component of the ceils resulting in further modified cells. 

[0046] The further modified cells are subjected to another round of screening/selection according to the same principles 
10 as the first round. Screening/selection identifies a subpopulation of further modified cells that have further evolved toward 
acquisition of the property This subpopulation of cells can be subjected to further rounds of recombination and screening 
according to the same principles, optionally with the stringency of screening being increased at each round. Eventually, 
cells are identified that have acquired the desired property. 

15 II. DEFINITIONS 

[0047] The term cognate refers to a genie sequence that is evolutionariiy and functionally related between species. 
For example, in the human genome, the human CD4 gene is the cognate gene to the mouse CD4 gene, since the 
sequences and structures of these two genes indicate that they are homologous and that both genes encode a protein 

20 which functions in signaling T-cell activation through MHC class ll-restricted antigen recognition. 

[0048] Screening is, in general, a two-step process in which one first determines which cells do and do not express 
a screening marker or phenotype (or a selected level of marker or phenotype), and then physically separates the cells 
having the desired property. Selection is a form of screening in which identification and physical separation are achieved 
simultaneously by expression of a selection marker, which, in some genetic circumstances, allows cells expressing the 

25 marker to survive while other cells die (or vice versa). Screening markers include luciferase, p-galactosidase, and green 
fluorescent protein. Selection markers include drug and toxin resistance genes. 

[0049] An exogenous DNA segment is one foreign (or heterologous) to the cell or homologous to the cell but in a 
position within the host ceil nucleic acid in which the element is not ordinarily found. Exogenous DNA segments can be 
expressed to yield exogenous polypeptides. 

30 [0050] The term "gene" is used broadly to refer to any segment of DNA associated with a biological function. Thus, 
genes include coding sequences and/or the regulatory sequences required for their expression. Genes also include 
nonexpressed DNA segments that, for example, form recognition sequences for other proteins, 
[0051] The terms "identical" or "percent identity," in the context of two or more nucleic acids or polypeptide sequences, 
refer to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues 

35 or nucleotides that are the same, when compared and aligned for maximum correspondence, as measured using one 
of the following sequence comparison algorithms or by visual inspection. 

[0052] The phrase "substantially identical," in the context of two nucleic acids or polypeptides, refers to two or more 
sequences or subsequences that have at least 60%, preferably 80%, most preferably 90-95% nucleotide or amino acid 
residue identity, when compared and aligned for maximum correspondence, as measured using one of the following 
40 sequence comparison algorithms or by visual inspection. Preferably, the substantial identity exists over a region of the 
sequences that is at least about 50 residues in length, more preferably over a region of at least about 100 residues, and 
most preferably the sequences are substantially identical over at least about 150 residues; In a most preferred embod- 
iment, the sequences are substantially identical over the entire length of the coding regions. 

[0053] For sequence comparison, typically one sequence acts as a reference sequence, to which test sequences are 
45 compared. When using a sequence comparison algorithm, test and reference sequences are input into a computer, 
subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. 
The sequence comparison algorithm then calculates the percent sequence identity for the test sequence(s) relative to 
the reference sequence, based on the designated program parameters. 

[0054] Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of 
50 Smith & Waterman, Adv. AppL Math. 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J. 
Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Natl Acad, Sci. USA 85:2444 
(1988), by computerized implementations of algorithms GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics 
Software Package Release 7.0, Genetics Computer Group, 575 Science Dr., Madison, Wl. 

[0055] Another example of a useful alignment algorithm is PILEUP. PILEUP creates a multiple sequence alignment 
55 from a group of related sequences using progressive, pairwise alignments to show relationship and percent sequence 
identity. It also plots a tree or dendogram showing the clustering relationships used to create the alignment. PILEUP 
uses a simplification of the progressive alignment method of Feng & Doolittle, J. Mol. Evol. 35:351-360 (1987). The 
method used is similar to the method described by Higgins & Sharp, CAS/OS 5:151-153 (1989). The program can align 
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up to 300 sequences, each of a maximum length of 5,000 nucleotides or amino acids. The multiple alignment procedure 
begins with the pairwise alignment of the two most similar sequences, producing a cluster of two aligned sequences. 
This cluster is then aligned to the next most related sequence or cluster of aligned sequences. Two clusters of sequences 
are aligned by a simple extension of the pairwise alignment of two individual sequences. The final alignment is achieved 
5 by a series of progressive, painwise alignments. The program is run by designating specific sequences and their amino 
acid or nucleotide coordinates for regions of sequence comparison and by designating the program parameters. For 
example, a reference sequence can be compared to other test sequences to determine the percent sequence identity 
relationship using the following parameters: default gap weight (3.00), default gap length weight (0.10), and weighted 
end gaps. 

10 [0056] Another example of algorithm that is suitable for determining percent sequence identity and sequence similarity 
is the BLAST algorithm, which is described in Altschul ef a/., J. MoL Biol. 215:403-410 (1990). Software for performing 
BLAST analyses is publicly available through the National Center for Biotechnology Information (http://www.nc- 
bi.ntm.nih.gov/). This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words 
of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned 

15 with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold 
(Altschul et al, supra). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs 
containing them. The word hits are then extended in both directions along each sequence for as far as the cumulative 
alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters 
M (reward score for a pair of matching residues; always > 0) and N (penalty score for mismatching residues; always < 

20 0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in 
each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved 
value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue 
alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the 
sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses as defaults a wordlength 

25 (W) of 11, an expectation (E) of 10, M=5, N=-4, and a comparison of both strands. For amino acid sequences, the 
BL.ASTP program uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix 
(seeHenikoff & Henikoffy Proc. Natl. Acad. Sci. USA 89:10915 (1989)). 

[0057] In addition to calculating percent sequence identity, the BLAST algorithm also performs a statistical analysis 
of the similarity between two sequences [see, e.g., Kariin & Altschul, Proc. Natl. Acad Sci. USA 90:5873-5787 (1993)). 

30 One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an 
indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance. 
For example, a nucleic acid is considered similar to a reference sequence if the smallest sum probability in a comparison 
of the test nucleic acid to the reference nucleic acid is less than about 0.1, more preferably less than about 0.01, and 
most preferably less than about 0,001. 

35 [0058] A further indication that two nucleic acid sequences or polypeptides are substantially identical is that the polypep- 
tide encoded by the first nucleic acid is immunologically cross reactive with the polypeptide encoded by the second 
nucleic acid, as described below. Thus, a polypeptide is typically substantially identical to a second polypeptide, for 
example, where the two peptides differ only by conservative substitutions. Another indication that two nucleic acid 
sequences are substantially identical is that the two molecules hybridize to each other under stringent conditions. 

40 [0059] The term "naturally-occurring" is used to describe an object thiat can be found in nature. For example, a polypep- 
tide or polynucleotide sequence that is present in an organism (including viruses) that can be isolated from a source in 
nature and which has not been intentionally modified by man in the laboratory is naturally-occurring. Generally, the term 
naturally-occurring refers to an object as present in a non-pathological (undiseased) individual, such as would be typical 
for the species. 

45 [0060] Asexual recombination is recombination occurring without the fusion of gametes to form a zygote. 

[0061] A "mismatch repair deficient strain" can include any mutants in any organism impaired in the functions of 
mismatch repair. These include mutant gene products of mutS, mutT, mutH, mutL, ovrD, dcm, vsr, umuG, umuD, sbcB, 
recJ, etc. The impairment is achieved by genetic mutation, allelic replacement, selective inhibition by an added reagent 
such as a small compound or an expressed antisense RNA, or other techniques. Impairment can be of the genes noted, 

50 or of homologous genes in any organism. 

III. VARIATIONS 

A. COATING FRAGMENTS WITH RECA PROTEIN 

55 

[0062] The frequency of homologous recombination between library fragments and cognate endogenous genes can 
be increased by coating the fragments with a recombinogenic protein before introduction into cells. See Pati el al., 
Molecular Biology of Cancer ^, 1 (1996); Sena & Zarling, Nature GeneticsZ, 365(1996); Revet etal., J. Mol. Biol. 232, 
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779-791 (1993); Kowalczkowski & Zarling in Gene Targeting (CRC 1995), Ch. 7. The recombinogenic protein promotes 
homologous pairing and/or strand exchange. The best characterized recA protein is from E. coli and is available from 
Pharmacia (Piscataway, NJ). In addition to the wild-type protein, a number of mutant recA-Wke proteins have been 
identified (e.g., recA803). Further, many organisms have recA-tike recombinases mih strand-transfer activities (e.g.. 

5 Ogawa et al., Cold Spring Harbor Symposium on Quantitative Biology 18, 567-576 (1993); Johnson & Symington, Mol. 
Cell. Biol. 15, 4843-4850 (1995); Fugisawa et aL, Nucl. Acids Res, 13, 7473 (1985); Hsieh et al., Ce//44, 885 (1986); 
Hsieh et al., J. Biol. Chem. 264. 5089 (1989); Fishel et al., Proc. Natl. Acad, Sci. USA 85, 3683 (1988); Cassuto et al., 
Mol. Gen, Genet. 208, 10 (1987); Ganea et al., Mol. Cell Biol.7, 3124 (1987); Moore et al., J. Biol. Chem. 19, 11108 
(1990); Keene et al., Nucl. Acids Res. 12, 3057 (1984); Kimiec, Cold Spring Harbor Symp. 48, 675 (1984); Kimeic, Cell 

10 44, 545 (1986); Kolodner et al., Proc, Natl. Acad. Sci. USA 84. 5560 (1987); Sugino et al., Proc. Natl. Acad. Sci, USA 
85, 3683 (1985); Halbrook et al., J. Biol. Chem. 264, 21403 (1989); Eisen et aL, Proc. Natl, Acad. Sci. USA 85, 7481 

(1988) ; McCarthy et al., Proc. Natl. Acad. Sci. USA 85, 5854 (1988); Lowenhaupt et al.. J. Biol. Chem. 264. 20568 

(1989) . Examples of such recombinase proteins include recA, fecA803, uvsX, (Roca, A.I.. Crit, Rev, Biochem. Molec. 
Biol. 25, 415 (1990)), sepi (Kolodner et al., Proc. Natl. Acad Sci. (U.S.A.)B4, 5560 (1987); Tishkoff et al., Molec. Cell. 

15 Biol. 11, 2593), RuvC (Dunderdate et al.. Nature 354, 506 (1991)). DS72. KEM1. XRN1 (Dykstra et al., Molec. Cell, 
Biol. 11, 2583 (1991)), STP^DSn (Clark et al., Molec. Ceil. Biol. 11, 2576 (1991)), HPP-1 (Moore et al., Proc. Natl. 
Acad, Sci. (U.S.A.) 88, 9067 (1991)), other eukaryolic recombinases (Bishop et al., Ce//69, 439 (1992); Shinohara et 
al., Ce//69.457. 

[0063] RecA protein forms a nucleoprotein filament when it coats a single-stranded DNA. In this nucleoprotein filament, 
20 one monomer of recA protein is bound to about 3 nucleotides. This property of recA to coat single-stranded DNA is 
essentially sequence independent, although particular sequences favor initial loading of recA onto a polynucleotide (e.g., 
nucleation sequences). The nucleoprotein filament(s) can be formed on essentially any DNA to be shuffled and can 
form complexes with both single-stranded and double-stranded DNA in prokaryotic and eukaryotic cells. 
[0064] Before contacting with recA or other recombinase, fragments are often denatured, e.g., by heat-treatment. 
25 RecA protein is then added at a concentration of about 1-10 jiM. After incubation, the recA-coated single-stranded DNA 
is introduced into recipient cells by conventional methods, such as chemical transformation or electroporation. In general, 
it can be desirable to coat the DNA with a RecA homotog isolated from the organism into which the coated DNA is being 
delivered. Recombination involves several cellular factors and the host RecA equivalent generally interacts better with 
other host factors than less closely related RecA molecules. The fragments undergo homologous recombination with 
30 cognate endogenous genes. Because of the increased frequency of reconibination due to recombinase coating, the 
fragments need not be introduced as components of vectors. 

[0065] Fragments are sometimes coated with other nucleic acid binding proteins that promote recombination, protect 
nucleic acids from degradation, or target nucleicacids to the nucleus. Examples of such proteins includes Agrobacterium 
virE2 (Durrenberger et al.. Proc. Natl. Acad. Sci. USA 86, 9154-9158 (1989)). Alternatively, the recipient strains are 
35 deficient in RecD activity. Single stranded ends can also be generated by 3-5' exonuclease activity or restriction enzymes 
producing 5* overhangs. 

1. MutS selection 

40 [0066] The E, coli mismatch repair protein MutS can be used in affinity chromatography to enrich for fragments of 
double-stranded DNA containing at least one base of mismatch. The MutS protein recognizes the bubble formed by the 
individual strands about the point of the mismatch. See, e.g. , Hsu & Chang, WO 9320233. The strategy of affinity enriching 
for partially mismatched duplexes can be incorporated into the present methods to increase the diversity between an 
incoming library of fragments and corresponding cognate or allelic genes in recipient cells. 

45 [0067] Fig. 2 shows one scheme in w/hich MutS is used to increase diversity. The DNA substrates for enrichment are 
substantially similar to each other but differ at a few sites. For example, the DNA substrates can represent complete or 
partial genomes (e.g., a chromosome library) from different individuals with the differences being due to polymorphisms. 
The substrates can also represent induced mutants of a wildtype sequence. The DNA substrates are pooled, restriction 
digested, and denatured to produce fragments of single-stranded DNA. The single-stranded DNA is then allowed to 

50 reanneal. Some single-stranded fragments reanneal with a perfectly matched complementary strand to generate perfectly 
matched duplexes. Other single-stranded fragments anneal to generate mismatched duplexes. The mismatched du- 
plexes are enriched from perfectly matched duplexes by MutS chromatography (e.g., with MutS Immobilized to beads). 
The mismatched duplexes recovered by chromatography are introduced into recipient cells for recombination with cog- 
nate endogenous genes as described above. MutS affinity chromatography increases the proportion of fragments differing 

55 from each other and the cognate endogenous gene. Thus, recombination between the incoming fragments and endog- 
enous genes results in greater diversity. 

[0068] Fig. 3 shows a second strategy for MutS enrichment. In this strategy, the substrates for MutS enrichment 
represent variants of a relatively short segment, for example, a gene or cluster of genes, in which most of the different 



11 



EP 1 707 641 A2 



variants differ at no more than a single nucleotide. The goal of MutS enrichment is to produce substrates for recombination 
that contain more variations than sequences occurring in nature. This is achieved by fragmenting the substrates at 
random to produce overiapping fragments. The fragments are denatured and reannealed as in the first strategy. Rean- 
nealing generates some mismatched duplexes which can be separated from perfectly matched duplexes by MutS affinity 
5 chromatography. As before, MutS chromatography enriches for duplexes bearing at least a single mismatch. The mis- 
matched duplexes are then reassembled into longer fragments. This is accomplished by cycles of denaturation, rean- 
nealing, and chain extension of partially annealed duplexes (see Section V). After several such cycles, fragments of the 
same length as the original substrates are achieved, except that these fragments differ from each other at multiple sites. 
These fragments are then introduced into cells where they undergo recombination with cognate endogenous genes. 

10 

2. Positive Selection For Allelic Exchange 

[0069] The invention further provides methods of enriching for cells bearing modified genes relative to the starting 
cells. This can be achieved by introducing a DNA fragment library (e.g., a single specific segment or a whole or partial 

15 genomic library) in a suicide vector (i.e., lacking a functional replication origin in the recipient cell type) containing both 
positive and negative selection markers. Optionally, multiple fragment libraries from different sources (e.g., B. subtilis, 
6. licheniformis and B. cereus) can be cloned into different vectors bearing different selection markers. Suitable positive 
selection markers include nec^, kanamycin^, hyg, hisD, gpLble, tef^. Suitable negative selection markers include ftsv-tk, 
hprtf gpt, Sac& uraSand cytosine deaminase, A variety of examples of conditional replication vectors, mutations affecting 

20 vector replication, limited host range vectors, and counterselectable markers are found in Berg and Berg, supra, and 
LaRossa, ibid, and the references therein. 

[0070] In one example, a plasmid with R6K and fl origins of replication, a positively selectable marker (beta-lactamase). 
and a counterselectable marker (B. subtilis sacB) was used. Ml 3 transduction of plasmids containing cloned genes 
were efficiently recombined into the chromosomal copy of that gene in a rep mutant E. co// strain. 

25 [0071] Another strategy for applying negative selection is to include a wildtype rpsL gene (encoding ribosomal protein 
SI 2) in a vector for use in cells having a mutant rpsL gene conferring streptomycin resistance. The mutant form of rpsL 
is recessive in cells having wildtype rpsL. Thus, selection for Sm resistance selects against cells having a wildtype copy 
of rpsL. See Skorupski & Taylor, Gene 169, 47-52 (1996). Alternatively, vectors bearing only a positive selection marker 
can be used with one round of selection for cells expressing the marker, and a subsequent round of screening for cells 

30 that have lost the marker (e.g., screening for drug sensitivity). The screen for cells that have lost the positive selection 
marker is equivalent to screening against expression of a negative selection marker. For example, Bacillus can be 
transformed with a vector bearing a CAT gene and a sequence to be integrated. See Harwood & Cutting, Moiecular 
Biological Mettiods for Bacillus, at pp. 31-33. Selection for chloramphenicol resistance isolates cells that have taken up 
vector. After a suitable period to allow recombination, selection for CAT sensitivity isolates cells which have lost the CAT 

35 gene. About 50% of such cells will have undergone recombination with the sequence to be integrated. 

[0072] Suicide vectors bearing a positive selection marker and optionally, a negative selection marker and a DNA 
fragment can integrate into host chromosomal DNA by a single crossover at a site in chromosomal DNA homologous 
to the fragment. Recombination generates an integrated vector flanked by direct repeats of the homologous sequence. 
In some cells, subsequent recombination between the repeats results in excision of the vector and either acquisition of 

40 a desired mutation from the vector by the genome or restoration of the genome to wildtype. 

[0073] In the present methods, after transfer of the gene library cloned in a suitable vector, positive selection is applied 
for expression of the positive selection marker. Because nonintegrated copies of the suicide vector are rapidly eliminated 
from cells, this selection enriches for cells that have integrated the vector into the host chromosome. The cells surviving 
positive selection can then be propagated and subjected to negative selection, or screened for loss of the positive 

45 selection marker. Negative selection selects against cells expressing the negative selection marker. Thus, cells that 
have retained the integrated vector express the negative marker and are selectively eliminated. The cells sun/iving both 
rounds of selection are those that initially integrated and then eliminated the vector. These cells are enriched for cells 
having genes modified by homologous recombination with the vector. This process diversifies by a single exchange of 
genetic information. However, if the process is repeated either with the same vectors or with a library of fragments 

50 generated by PCR of pooled DNA from the enriched recombinant population, resulting in the diversity of targeted genes 
being enhanced exponentially each round of recombination. This process can be repeated recursively, with selection 
being performed as desired. 

3. Individualized Optimization of Genes 

55 

[0074] In general, the above methods do not require knowledge of the number of genes to be optimized, their map 
location or their function. However, in soine instances, where this information is available for one or more gene, it can 
be exploited. For example, if the property to be acquired by evolution is enhanced recombination of cells, one gene likely 



12 



EP 1 707 641 A2 



to be important is recA, even though many other genes, known and uni^nown, may make additional contributions. In this 
situation, the recA gene can be evolved, at least in part, separately from other candidate genes. The recA gene can be 
evolved by any of the methods of recursive recombination described in Section V. Briefly, this approach entails obtaining 
diverse forms of a recA gene, allowing the forms to recombine, selecting recombinants having improved properties, and 

5 subjecting the recombinants to further cycles of recombination and selection. At any point in the individualized improve- 
ment of recA, the diverse forms of recA can be pooled with fragments encoding other genes in a library to be used in 
the general methods described herein. In this way, the library is seeded to contain a higher proportion of variants in a 
gene known to be important to the property sought to be acquired than would otherwise be the case. 
[0075] In one example (illustrated in Fig. 20B), a plasmid is constructed carrying a non-functional (mutated) version 

^0 of a chromosomal gene such as URA3, where the wild-type gene confers sensitivity to a dmg (in this case 5-fluoroorotic 
acid). The plasmid also carries a selectable marker (resistance to another drug such as kanamycin), and a library of 
recA variants. Transformation of the plasmid into the cell results in expression of the recA variants, some of which will 
catalyze homologous recombination at an increased rate. Those cells in which homologous recombination occurred are 
resistant to the selectable drug on the plasmid, and to 5-fluoroorotic acid because of the disruption of the chromosomal 

15 copy of this gene. The recA variants which give the highest rates of homologous recombination are the most highly 
represented in a pool of homologous recombinants. The mutant rec/A genes can be isolated from this pool by PGR, re- 
shuffled, cloned back into the plasmid and the process repeated. Other sequences can be inserted in place of recA to 
evolve other components of the homologous recombination system. 

20 4. Harvesting DNA Substrates for Shuffling 

[0076] In some shuffling methods, DNA substrates are isolated from natural sources and are not easily manipulated 
by DNA modifying or polymerizing enzymes due to recalcitrant impurities, which poison enzymatic reactions. Such 
difficulties can be avoided by processing DNA substrates through a harvesting strain. The harvesting strain is typically 

25 a cell type with natural competence and a capacity for homologous recombination between sequences with substantial 
diversity (e.g., sequences exhibiting only 75% sequence identity). The harvesting strain bears a vector encoding a 
negative selection marker flanked by two segments respectively complementary to two segments flanking a gene or 
other region of interest in the DNA from a target organism. The harvesting strain is contacted with fragments of DNA 
from the target organism. Fragments are taken up by natural competence, or other methods described herein, and a 

30 fragment of interest from the target organism recombines with the vector of the harvesting strain causing loss of the 
negative selection marker, Selection against the negative marker allows isolation of cells that have taken up the fragment 
of interest. Shuffling can be carried out in the harvester strain (e.g., a RecE/T strain) or vector can be isolated from the 
harvester strain for in vitro shuffling or transfer to a different cell type for in vivo shuffling. Alternatively, the vector can 
be transferred to a different cell type by conjugation, protoplast fusion or electrofusion. An example of a suitable harvester 

35 strain is Acinetobacter caleoaceticus mutS. Melnikov and Young man, (1999) NucI Acid Res 27(4):1 056-1 062. This 
strain is naturally competent and takes up DNA in a nonsequence-specific manner. Also, because of the mutS mutation, 
this strain is capable of homologous recombination of sequences showing only 75% sequence identity. 

IV. APPLICATIONS 

40 

A. RECOMBINOGENICITY 

[0077] One goal of whole cell evolution is to generate cells having improved capacity for recombination. Such cells 
are useful for a variety of purposes in molecular genetics including the in wVo formats of recursive sequence recombination 

45 described in Section V. Almoist thirty genes (e.g., recA, recB, recC, recD, recE, recF, recG, recO, recQ, recR, recT. 
ruvA, ruvS, ruvC, sbcB, ssb, topA, gyrA and B, lig, polA, uviD, E, recL, mu(D, mulH, muiL, mufT, muflJ, heD) and DNA 
sites (e.g., chi, recN, sbcC) involved in genetic recombination have been identified in E. coli, and cognate forms of 
several of these genes have been found in other organisms (e.g., radSI , rad55-rad57, Dmcl in yeast (see Kowalczykowski 
et al., Microbiol, Rev. 58, 401-465 (1994); Kowalczkowski & Zarling, supra) and human homologs of Rad51 and Dmcl 

50 have been identified (see Sandler et al., Nuc/. Acids Res. 24, 2125-2132 (1996)). At least some of the E. coli genes, 
including recA are functional in mammalian cells, and can be targeted to the nucleus as a fusion with SV40 large T 
antigen nuclear targeting sequence (Reiss et al., Proc. Natl. Acad. Sci. USA, 93, 3094-3098 (1996)). Further, mutations 
in mismatch repair genes, such as mutL, mutS, mutH, mufl relax homology requirements and allow recombination 
between morediverged sequences (Raysstguieret al., A/aft/re 342, 396-401 (1989)). The extent of recombination between 

55 divergent strains can be enhanced by impairing mismatch repair genes and stimulating SOS genes. Such can be achieved 
by use of appropriate mutant strains and/or growth under conditions of metabolic stress, which have been found to 
stimulate SOS and inhibit mismatch repair genes. Vulic et al., Proc. Natl: Acad Sci. USA 94 (1997). In addition, this can 
be achieved by impairing the products of mismatch repair genes by exposure to selective inhibitors. 
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[0078] Starting substrates for recombination are selected according to the general principles described above. That 
is. the substrates can be whole genomes or fractions thereof containing recombination genes or sites. Large libraries 
of essentially random fragments can be seeded with collections of fragments constituting variants of one or more known 
recombination genes, such as recA. Alternatively, libraries can be formed by mixing-variant forms of the various known 

5 recombination genes and sites. 

[0079] The library of fragments is introduced into the recipient cells to be improved and recombination occurs, gen- 
erating modified cells. The recipient cells preferably contain a marker gene whose expression has been disabled in a 
manner that can be corrected by recombination. For example, the cells can contain two copies of a marker gene bearing 
mutations at different sites, which copies can recpmbine to generate the wjidtype gene. A suitable marker gene is green 

10 fluorescent protein. A vector can be constructed encoding one copy of GFP having stopcodons near the N-terminus, 
and another copy of GFP having stopcodons hear the C-terminus of the protein. The distance between the stop codons 
at the respective ends of the molecule is 500 bp and about 25% of recombination events result in active GFP. Expression 
of GFP in a cell signals that a cell is capable of homologous recombination to recombine in between the stop codons to 
generate a contiguous coding sequence. By screening for cells expressing GFP, one enriches for cells having the highest 

15 capacity for recombination.. The same type of screen can be used following subsequent rounds of recombination. How- 
ever, unless the selection marker used in previous round(s).was present on a suicide vector, subsequent round{s) should 
employ a second disabled screening marker within a second vector bearing a different origin of replication or a different 
positive selection marker to vectors used in the previous rounds. 

20 B. MULTIGENOMIC COPY NUMBER-GENE REDUNDANCY 

[0080] The majority of bacterial cells in stationary phase cultures grown in rich media contain two, four or eight genomes. 
In minimal medium the cells contain one or two genomes. The number of genomes per bacterial cell thus depends on 
the growth rate of the cell as it enters stationary phase. This is because rapidly growing cells contain multiple replication 

25 forks, resulting in several genomes in the cells after termination. The number of genomes is strain dependent, although 
all strains tested have more than one chromosome in stationary phase. The number of genomes in stationary phase 
celts decreases with time. This appears to be due to fragmentation and degradation of entire chromosomes, similar to 
apoptosis in mammalian cells. This fragmentation of genomes in cells containing multiple genome copies results in 
massive recombination and mutagenesis. Useful mutants may find ways to use energy sources that will allow them to 

30 continue growing. Multigenome or gene-redundant cells are much more resistant to mutagenesis and can be improved 
for a selected trait faster. 

[0081] Some ceil types, such as Deinococcus radians (Daly and Minton V. BacterioL 177, 5495-5505 (1995)) exhibit 
polyploidy throughout the cell cycle. This cell type is highly radiation resistant due to the presence of many copies of the 
genome. High frequency recombination between the genomes allows rapid removal of mutations induced by a variety 

35 of DNA damaging agents. 

[0082] A goal of the present methods is to evolve other cell types to have increased genome copy number akin to that 
of Deinoccocus radians. Preferably, the increased copy number is maintained through all or most of its cell cycle in all 
or most growth conditions. The presence of multiple genome copies in such cells results in a higher frequency of 
homologous recombination in these cells, both between copies of a gene in different genomes within the cell, and 

40 between a genome within the cell and a transfected fragment. The increased frequency of recombination allows the 
cells to be evolved more quickly to acquire other useful characteristics. 

[0083] Starting substrates for recombination can be a diverse library of genes only a few of which are relevant to 
genomic copy number, a focused library formed from variants of gene(s) known or suspected to have a role in genomic 
copy number or a combination of the two. As a general rule one would expect increased copy number would be achieved 

"^5 by evolution of genes involved in replication and cell septation such that cell septation is inhibited without impairing 
replication. Genes involved in replication Include tus, xe/C, xe/0, dif, gyrA, gyfB, parE, pa/C, dif, Ter/K, TeiS, TefC, 
TerD, TerE, TerF, and genes influencing chromosome partitioning and gene copy number include m/nD, maAA {tolC), 
mukB, mukC, mukO, spoOJ, spoil IE (Wake & Enrington, Annu. Rev. Genet 29, 41-67 (1995)). A useful source of 
substrates is the genome of a cell type such as Deinoccocus radians known to have the desired phenotype of multigenomic 

50 copy number. As well as, or instead of, the above substrates, fragriients encoding protein or antisense RNA inhibitors 
to genes known to be involved in cell septation can also be used. 

[0084] In nature, the existence of multiple genomic copies in a cell type would usually not be advantageous due to 
the greater nutritional requirements needed to maintain this copy number. However, artificial conditions can be devised 
to select for high copy number. Modified cells having recombinant genomes are grown in rich media (in which conditions, 
55 multicopy number should not be a disadvantage) and exposed to a mutagen, such as ultraviolet or gamma irradiation 
or a chemical mutagen, e.g., mitomycin, nitrous acid, photoactivated psoralens, alone or in combination, which induces 
DNA breaks amenable to repair by recombination. These conditions select for cells having multicopy number due to the 
greater efficiency with which mutations can be excised. Modified cells surviving exposure to mutagen are enriched for 
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cells with multiple genome copies. If desired, selected celts can be individually analyzed for genome copy number (e.g., 
by quantitative hybridization with appropriate controls). Some or all of the collection of cells surviving selection provide 
the substrates for the next round of recombination. In addition, individual cells can be sorted using a ceii sorter for those 
cells containing more DNA, e.g., using DNA specific fluorescent compounds or sorting for increased size using light 
5 dispersion. Eventually cells are evolved that have at least 2, 4, 6, 8 or 10 copies of the genome throughout the cell cycle. 
In a similar manner, protoplasts can also be recombined. 

C. SECRETION 

10 [0085] The protein (or metabolite) secretion pathways of bacterial and eukaryotic cells can be evolved to export desired 
molecules more efficiently, such as for the manufacturing of protein pharmaceuticals, small molecule drugs or specialty 
chemicals. Improvements in efficiency are particularly desirable for proteins requiring multisubunit assembly (such as 
antibodies) or extensive posttranslational modification before secretion. 

[0086] The efficiency of secretion may depend on a number of genetic sequences including a signal peptide coding 

15 sequence, sequences encoding protein(s) that cleave or otherwise recognize the coding sequence, and the coding 
sequence of the protein being secreted . The latter may affect folding of the protein and the ease with which it can integrate 
into and traverse membranes. The bacterial secretion pathway in E. co/Zinclude the SecA, SedB, SecE, SedD and SecF 
genes. In Bacillus subtilis, the major genes are secA, secD, secE, secF, sebY, ffh, ftsY together with five signal peptidase 
genes (sipS, sipT, sipU, sipV and sipW) (Kunst et al, supra). For proteins requiring posttranslational modification, evolution 

20 of genes effecting such modification may contribute to improved secrefion. Likewise genes with expression products 
having a role in assembly of multisubunit proteins (e.g., chaperonins) may also contribute to improved secretion. 
[0087] Selection of substrates for recombination follows the general principles discussed above. In this case, the 
focused libraries referred to above comprise variants of the known secrefion genes. For evolution of prokaryotic cells to 
express eukaryofic proteins, the initial substrates for recombination are often obtained at least in part from eukaryofic 

25 sources. Incoming fragments can undergo recombinafion both with chromosomal DNA in recipient cells and with the 
screening marker construct present in such cells (see below). The latter form of recombination is important for evolution 
of the signal coding sequence incorporated in the screening marker construct. Improved secretion can be screened by 
the inclusion of marker construct in the cells being evolved. The marker construct encodes a marker gene, operably 
linked to expression sequences, and usually operably linked to a signal peptide coding sequence. The marker gene is 

30 somefimes expressed as a fusion protein with a recombinant protein of interest: This approach is useful when one wants 
to evolve the recombinant protein coding sequence together with secrefion genes. 

[0088] In one variafion, the marker gene encodes a product that is toxic to the cell containing the construct unless the 
product is secreted. Suitable toxin proteins include diphtheria toxin and ricin toxin. Propagation of modified cells bearing 
such a construct selects for cells that have evolved to improve secretion of the toxin. Alternatively, the marker gene can 

35 encode a ligand to a known receptor, and cells bearing the iigand can be detected by FACS using labeled receptor. 
Optionally, such a ligand can be operably linked to a phospholipid anchoring sequence that binds the ligand to the cell 
membrane surface following secrefion. (See commonly owned, copending 08/309,345). In a further variafion, secreted 
marker protein can be maintained in proximity with the cell secreting it by distribufing individual cells into agar drops. 
This is done, e.g., by droplet formation of a celt suspension. Secreted protein is confined within the agar matrix and can 

40 be detected by e.g.. FACS. In another variation, a protein of interest is expressed as a fusion protein together with b- 
lactamase or alkaline phosphatase. These enzymes metabolize commercially available chromogenic substrates (e.g., 
X-gal), but do so only after secrefion into the periplasm. Appearance of colored substrate in a colony of cells therefore 
indicates capacity to secrete the fusion protein and the intensity of color is related to the efficiency of secretion. 
[0089] The cells identified by these screening and selection methods have the capacity to secrete increased amounts 

45 of protein. This capacity may be attributable to increased secrefion and increased expression, or from increased secrefion 
alone. 

1 . Expression 

50 [0090] Cells can also be evolved to acquire increased expression of a recombinant protein. The level of expression 
is, of course, highly dependent on the construct from which the recombinant protein is expressed and the regulatory 
sequences, such as the promoter, enhancer(s) and transcription termination site contained therein. Expression can also 
be affected by a large number of host genes having rotes in transcripfion, posttranslational modification and translafion. 
In addition, host genes involved in synthesis of ribonucleotide and amino acid monomers for transcription and translation 

55 may have indirect effects on efficiency of expression. Selecfion of substrates for recombination follows the general 
principles discussed above. In this case, focused libraries comprise variants of genes known to have roles In expression. 
For evolution of prokarydfic cells to express eukaryofic proteins, the initial substrates for recombination are often obtained, 
at least in part, from eukaryofic sources; that is eukaryofic genes encoding proteins such as chaperonins involved in 
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secretion and/assembly of proteins. Incoming fragments can undergo recombination both with chromosomal DNA in 
recipient cells and with the screening marker construct present in such cells (see below). 

[0091] Screening for improved expression can be effected by including a reporter construct in the cells being evolved. 
The reporter construct expresses (and usually secretes) a reporter protein, such as GFP, which is easily detected and 
5 nontoxic. The reporter protein can be expressed alone or together with a protein of interest as a fusion protein. If the 
reporter gene is secreted, the screening effectively selects for cells having either improved secretion or improved ex- 
pression, or both. 

2. Plant Cells 

10 

[0092] A further application of recursive sequence recombination is the (evolution of plant celts, and transgenic plants 
derived from the same, to acquire resistance to pathogenic diseases (fungi, viruses and bacteria), insects, chemicals 
(such as salt, selenium, pollutants, pesticides; herbicides, or the like), including, e.g., atrazine or glyphosate, or to modify 
chemical composition, yield or the like. The substrates for recombination can again be whole genomic libraries, fractions 

15 thereof or focused libraries containing variants of gene(s) known or suspected to confer resistance to one of the above 
agents. Frequently, library fragments are obtained from a different species to the plant being evolved. 
[0093] The DNA fragments are introduced into plant tissues, cultured plant cells, plant microspores, or plant protoplasts 
by standard methods including electroporation (From et al., Proc, Natl. Acad Sci. USA 82, 5824 (1 985), infection by viral 
vectors such as cauliflower mosaic virus (CaMV) (Hohn et al.. Molecular Biology of Plant Tumors, (Academic Press, 

20 New York, 1 982) pp. 549-560; Howell, US 4,407,956), high velocity ballistic penetration by small particles with the nucleic 
acid either within the matrix of smalt beads or particles, or on the surface (Klein et al.. Nature 327, 70-73 (1987)), use 
of pollen as vector (WO 85/01856), or use of Agrobacterium tumefaciens or A. rhtzogenes carrying a T-DNA plasmid 
in which DNA fragments are cloned. The T-DNA plasmid is transmitted to plant cells upon infection by Agrobacterium 
tumefaciens, and a portion is stably integrated into the plant genome (Horsch et al.. Science 233, 496-498 (1984); Fratey 

25 et at., Proc, Natl Acad Sci. USA 80, 4803 (1983)). 

[0094] Diversity can also be generated by genetic exchange between plant protoplasts according to the same principles 
described below for fungal protoplasts. Procedures for formation and fusion of plant protoplasts are described by Taka- 
hashi et al., US 4,677,066; Akagi et al., US 5.360,725; Shimamoto et al.. Us 5,250,433; Cheney et at., US 5,426,040. 
[0095] After a suitable period of incubation to allow recombination to occur and for expression of recombinant genes, 

30 the plant cells are contacted with the agent to which resistance is to be acquired, and surviving plant cells are collected. 
Some or all of these plant cells can be subject to a further round of recombination and screening. Eventually, plant cells 
having the required degree of resistance are obtained. 

[0096] These cells can then be cultured into transgenic plants. Plant regeneration from cultured protoplasts is described 
in Evans et al., "Protoplast Isolation and Culture," Handbook of Plant Cell Cultures 1, 124-176 (MacMillan Publishing 
35 Co., New York, 1983); Davey, "Recent Developments in the Culture and Regeneration of Plant Protoplasts." Protoplasts, 
(1983) pp. 12-29, (Birkhauser, Basal 1983); Dale, "Protoplast Culture and Plant Regeneration of Cereals and Other 
Recalcitrant Crops," Protoplasts (1983) pp. 31-41, (Birkhauser, Basel 1983); Binding, "Regeneration of Plants," Plant 
Protoplasts, pp. 21-73; (CRC Press, Boca Raton, 1985). 

[0097] In a variation of the above method, one or more preliminary rounds of recombination and screening can be 
performed in bacterial celts according to the same general strategy as described for plant cells. More rapid evolution 
can be achieved in bacterial cells due to their greater growth rate and the greater efficiency with which DNA can be 
introduced into such cells. After one or more rounds of recombination/screening, a DNA fragment library Is recovered 
from bacteria and transformed into the plant cells. The library can either be a complete library or a focused library. A 
focused library can be produced by amplification from primers specific for plant sequences, particularly plant sequences 
45 known or suspected to have a role in conferring resistance. 

3. Example: Concatemeric Assembly of Atrazine-Catabolizing Plasmid 

[0098] Pseudomonas atrazine catabolizing genes AtzA and AtzB were subcloned from pMDI (deSouza et al.. Appi 
50 Environ, Microbiol. 61. 3373-3378 (1995); de Souza et at., J. BacterioL 178, 4894-4900 (1996)) into pUC18. A 1.9 kb 
Aval fragment containing AtzA was end-filled and inserted into an Aval site of pUC 18. A 3.9 kb Clal fragment containing 
AtzB was end-filled and cloned into the Hindi site of pUC18. AtzA was then excised from pUC18 with EcoRI and BamHI, 
AzB with BamHI and Hindlll, and the two inserts were co-ligated into pUC18 digested with EcoRI and Hindlll. The result 
was a 5,8 kb insert containing AtzA and AtzB in pUC18 (total plasmid size 8.4 kb). 
55 [0099] Recursive sequence recombination was performed as follows. The entire 8.4 kb plasmid was treated with 
DNasel in 50 mM Tris-CI pH 7.5, 10 mM MnClj and fragments between 500 and 2000 bp were gel purified. The fragments 
were assembled in a PCR reaction using Tth-XL enzyme and buffer from Perkin Elmer, 2.5 mM MgOAc, 400 \M dNTPs 
and serial dilutions of DNA fragments. The assembly reaction was performed in an MJ Research "DNA Engine" pro- 
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grammed with the following cycles: 1 ) 94X, 20 seconds; 2) 94X, 1 5 seconds; 3) 40''C, 30 seconds; 4) 72''C, 30 seconds 
+ 2 seconds per cycle; 5) go to step 2, 39 more times; 6) 4°C. 

[01 00] The AtzA and AtzB genes were not amplified from the assembly reaction using the polymerase chain reaction, 
so instead DNA was purified from the reaction by pheno! extraction and ethanol precipitation, then digested the assembled 
5 DNA with a restriction enzyme that linearized the plasmid (Kpnl: the Kpnl site in pUC18 was lost during subcloning, 
leaving only the Kpnl site in AtzA). Linearized plasmid was gel-purified, self-ligated overnight and transformed into E, 
CO// strain NM522, (The choice of host strain was relevant: very little plasmid of poor quality was obtained from a number 
of other commercially available strains including TGI, DH10B, DH12S.) 

[0101] Serial dilutions of the transformation reaction were plated onto LB plates containing 50 \iglm\ ampicillin, the 
10 remainder of the transformation was made 25% in glycerol and frozen at -80**C. Once the transformed cells were titered, 
the frozen cells were plated at a density of between 200 and 500 oh 150 mm diameter plates containing 500 fj.g/ml 
atrazine and grown at 37''C. 

[01 02] Atrazine at 500p.g/ml forms an insoluble precipitate. The products of the AtzA and AtzB genes transform atrazine 
into a soluble product. Cells containing the wild type AtzA and AtzB genes in pUC18 wilt thus be surrounded by a clear 

15 halo where the atrazine has been degraded. The more active the AtzA and AtzB enzymes, the more rapidly a clear halo 
will form and grow on atrazine-containing plates. Positives were picked as those colonies that most rapidly formed the 
largest clear zones. The (approximately) 40 best colonies were picked, pooled, grown in the presence of 50 ng/ml 
ampicillin and plasmid prepared from them. The entire process (from DNase-treatment to plating on atrazine plates) 
was repeated 4 times with 2000-4000 colonies/cycle. 

20 [0103] A modification was made in the fourth round. Ceils were plated on both 500 fxg/ml atrazine, and 500 ^g/ml of 
the atrazine analogue terbutylazine, which was undegradable by the wild type AtzA and AtzB genes. Positives were 
obtained that degraded both compounds. The atrazine chlorohydrolase (product of AtzA gene) was 10-100 fold higher 
than that produced by the wildtype gene. 

25 D. PLANT GENOME SHUFFLING 

[0104] Plant genome shuffling allows recursive cycles to be used for the introduction and recombination of genes or 
pathways that confer improved properties to desired plant species. Any plant species, including weeds and wild cultivars, 
showing a desired trait, such as herbicide resistance, salt tolerance, pest resistance, or temperature tolerance, can be 

30 used as the source of DNA that is introduced into the crop or horticultural host plant species. 

[0105] Genomic DNA prepared from the source plant is fragmented (e.g. by DNasel, restriction enzymes, or mechan- 
ically) and cloned into a vector suitable for making plant genomic libraries, such as pGA482 (An. G., 1995, Methods 
Mol. BioL 44:47-58), This vector contains the A. tumefaciens left and right borders needed for gene transfer to plant 
cells and antibiotic markers for selection in E, coli, Agrobactehum, and plant cells. A multicloning site is provided for 

35 insertion of the genomic fragments. A cos sequence is present for the efficient packaging of DNA into bacteriophage 
lambda heads for transfection of the primary library into E. coli. The vector accepts DNA fragments of 25-40 kb. 
[0106] The primary library can also be directly electroporated into an A. tumefaciens or A. rhizogenes strain that is 
used to infect and transform host plant cells (Main, GD et al., 1995, Methods Mol. Biol. 44:405-412). Alternatively, DNA 
can be introduced by electroporation or PEG-mediated uptake into protoplasts of the recipient plant species (Bilang et 

40 al. (1994) Plant Mol. Biol Manual, Kluwer Academic Publishers, A1:1-16) or by particle bombardment of cells or tissues 
(Christou, ibid, A2:1-15). If necessary, antibiotic markers in the T-DNA region can be eliminated, as long as selection 
for the trait is possible, so that the final plant products contain no antibiotic genes. 

[0107] Stably transformed whole celts acquiring the trait are selected on solid or liquid media containing the agent to 
which the introduced DNA confers resistance or tolerance. If the trait in question cannot be selected for directly, trans- 
45 formed cells can be selected with antibiotics and allowed to form callus or regenerated to whole plants and then screened 
for the desired property. 

[0108] The second and further cycles consist of isolating'genomic DNA from each transgenic line and introducing it 
into one or more of the other transgenic lines. In each round, transformed celts are selected or screened for incremental 
improvement. To speed the process of using multiple cycles of transformation, plant regeneration can be deferred until 
50 the last round. Callus tissue generated from the protoplasts or transformed tissues can serve as a source of genomic 
DNA and new host cells. After the final round, fertile plants are regenerated and the progeny are selected for homozygosity 
of the inserted DNAs: Ultimately, a new plant is created that carries multiple inserts which additively or synergistically 
combine to confer high levels of the desired trait. Alternatively, microspores can be isotated as homozygotes generated 
from spontaneous diploids. 

55 [0109] In addition, the introduced DNA that confers the desired trait can be traced because it is flanked by known 
sequences in the vector. Either PGR or plasmid rescue is used to isolate the sequences and characterize them in more 
detail. Long PGR (Foord, OS and Rose. EA, 1995, PGR Primer: A Laboratory Manual, CSHL Press, pp 63-77) of the 
full 25-40 kb insert is achieved with the proper reagents and techniques using as primers the T-DNA border sequences. 
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If the vector is modified to contain the E. co// origin of replication and an antibiotic marker between the T-DNA borders, 
a rare cutting restriction enzyme, such as NotI or Sfil, that cuts only at the ends of the inserted DNA is used to create 
fragments containing the source plant DNA that are then self-llgated and transformed into E, co// where they replicate 
as plasmids. The total DNA or subfragment of it that is responsible for the transferred trait can be subjected to in vitro 
5 evolution by DNA shuffling. The shuffled library can be reiterattvely recombined by any method herein and then introduced 
into host plant cells and screened for improvement of the trait. In this way, single and multigene traits can be transferred 
from one species to another and optimized for higher expression or activity leading to whole organism improvement. 
This entire process can also be reiteratively repeated. 

[0110] Alternatively, the cells can be transformed microspores with the regenerated haploid plants being screened 
10 directly for improved traits as noted below. 

E. MICROSPORE MANIPULATION 

[0111] Microspores are haploid (In) male spores that develop into pollen grains. Anthers contain a large numbers of 
15 microspores in early-uninucteate to nrst-mitosis stages. Microspores have been successfully induced to develop into 
plants for most species, such as, e.g., rice (Chen, CC 1977 In Vitro. 13: 484-489), tobacco (Atanassov, I. et al. 1998 
Plant Mol Biol. 38: 1 1 69-1 1 78), Tradescantia (Savage J RK and Papworth DG. 1 998 Mutat Res. 422:31 3-322), Arabidopsis 
(ParkSKetal. 1998 Development. 125:3789-3799), sugar beet (Majewska-Sawka A and Rodrigues-Garcia Ml 1996 J 
Cell Sci. 109:859-866), Barley (Olsen FL 1991 Hereditas 1 15:255-266) and oilseed rape (Boutillier KA et al. 1994 Plant 
20 Mol Biol. 26:1711-1723). 

(01 1 2] The plants derived from microspores are predominantly haploid or diploid (infrequently polyploid and aneuploid). 
The diploid plants are homozygous and fertile and can be generated in a relatively short time. Microspores obtained 
from F1 hybrid plants represent great diversity, thus being an excellent model for studying recombination. In addition, 
microspores can be transformed with T-DNA introduced by agrobacterium or other available means and then regenerated 
25 Into individual plants. Furthermore, protoplasts can be made from microspores and they can be fused similar to what 
occur in fungi and bacteria. 

[01 1 3] Microspores, due to their complex ploidy and regenerating ability, provide a tool for plant whole genome shuffling. 
For example, if pollens from 4 parents are collected and pooled, and then used to randomly pollinate the parents, the 
progenies should have 2^ = 16 possible combinations. Assuming this plant has 7 chromosomes, microspores collected 

30 from the 16 progenies will represent 2^x16 = 2048 possible chromosomal combinations. This number is even greater if 
meiotic processes occur. When diploid, homozygous embryos are generated from these microspores, in many cases, 
they are screened for desired phenotypes, such as herbicide- or disease- resistant. In addition, for plant oil composition 
these embryos can be dissected into two halves: one for analysis the other for regeneration into a viable plant. 
[0114] Protoplasts generated from microspores (especially the haploid ones) are pooled and fused. Microspores 

35 obtained from plants generated by protoplast fusion are pooled and fused again, increasing the genetic diversity of the 
resulting microspores. 

[0115] Microspores can be subjected to mutagenesis in various ways, such as by chemical mutagenesis, radiation- 
induced mutagenesis and, e.g . , t-DNA transformation, prior to fusion or regeneration. New mutations which are generated 
can be recombined through the recursive processes described above and herein. 

40 

F. EXAMPLE: ACQUISITION OF SALT TOLERANCE 

[01 1 6] As depicted in Fig. 21 , DNA from a salt tolerant plant is isolated and used to create a genomic library. Protoplasts 
made from the recipient species are transformed/transfected with the genomic library (e.g., by electroporation, agrobac- 

45 terium, etc.). Cells are selected on media with a normally inhibitory level of NaCI. Only the cells with newly acquired salt 
tolerance will grow into callus tissue. The best lines are chosen and genomic libraries are made from their pooled DNA. 
These libraries are transformed into protoplasts made from the first round transformed calti. Again, cells are selected 
on increased salt concentrations. After the desired level of salt tolerance is achieved, the callus tissue can be induced 
to regenerate whole plants. Progeny of these plants are typically analyzed for homozygosity of the inserts to ensure 

50 stability of the acquired trait. At the indicated steps, plant regeneration or isolation and shuffling of the introduced genes 
can be added to the overall protocol. 

G. TRANSGENIC ANIMALS 

55 1. Transgene Optimization 

[0117] One goal of transgenesis is to produce transgenic animals, such as mice, rabbits, sheep, pigs, goats, and 
cattle, secreting a recombinant protein in the milk. A transgene for this purpose typically comprises in operable linkage 
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a promoter and an enhancer from a milk-protein gene (e.g., a, p, or y casein, p-lactoglobulin, acid whey protein or a- 
lactalbumin), a signal sequence, a recombinant protein coding sequence and a transcription termination site. Optionally, 
a transgene can encode multiple chains of a multichain protein, such as an immunoglobulin, in which case, the two 
chains are usually individually operably linked to sets of regulatory sequences. Transgenes can be optimized for ex- 

5 pression and secretion by recursive sequence recombination. Suitable substrates for recombination include regulatory 
sequences such as promoters and enhancers from milk-protein genes from different species or individual animals. 
Cycles of recombination can be performed in vitro or in vivo by any of the formats discussed in Section V. Screening is 
performed in vivo on cultures of mammary-gland derived cells, such as HC1 1 or MacT, transfected with transgenes and 
reporter constructs such as those discussed above. After several cycles of recombination and screening, transgenes 

^0 resulting in the highest levels of expression and secretion are extracted from the mammary gland tissue culture cells 
and used to transfect embryonic cells, such as zygotes and embryonic stem cells, which are matured into transgenic 
animals. 

2. Whole Animal Optimization 

15 

[0118] In this approach, libraries of incoming fragments are transformed into embryonic cells, such as ES cells or 
zygotes. The fragments can be variants of a gene known to confer a desired property, such as growth hormone. Alter- 
natively, the fragments can be partial or complete genomic libraries including many genes. 

[0119] Fragments are usually introduced into zygotes by microinjection as described by Gordon et al., Methods En- 

20 zymol. 101, 414 (1984); Hogan et al., Manipulation of tfie Mouse Embryo: A Laboratory Manual (C.S.H.L. N.Y., 1986) 
(mouse embryo); and Hammer et al., Nature 315, 680 (1985) (rabbit and porcine embryos); Gandolft et al., J. Reprod 
Fert. 81, 23-28 (1987); Rexroad et al., J, Anim, Sci. 66, 947-953 (1988) (ovine embryos) and Eyestone et al., J, Reprod. 
Fert. 85, 715-720 (1989); Camous et al., J. Reprod. Fert. 72, 779-785 (1984); and Heyman et al., Theriogenology 27 , 
5968 (1987) (bovine embryos). Zygotes are then matured and introduced into recipient female animals which gestate 

25 the embryo and give birth to a transgenic offspring. 

[0120] Alternatively, transgenes can be introduced into embryonic stem cells (ES). These cells are obtained from 
preimplantation embryos cultured in vitro. Bradley et al., Nature 309, 255-258 (1984). Transgenes can be introduced 
into such cells by electroporation or microinjection. Transforrhed ES cells are combined with blastocysts from a non- 
human animal. The ES cells colonize the embryo and In some embryos form the germ line of the resulting chimeric 

30 animal. See Jaenisch. Sc/ence, 240, 1468-1474 (1988). 

[0121] Regardless whether zygotes or ES are used, screening is performed on whole animals for a desired property, 
such as increased size and/or growth rate. DNA is extracted from animals having evolved toward acquisition of the 
desired property. This DNA is then used to transfect further embryonic cells. These cells can also be obtained from 
animals that have acquired toward the desired property In a split and pool approach. That is, DNA from one subset of 

35 such animals is transformed into embryonic cells prepared from another subset of the animals. Alternatively, the DNA 
from animals that have evolved toward acquisition of the desired property can be transfected into fresh embryonic cells. 
In either alternative, transfected cells are matured into transgenic animals, and the animals subjected to a further round 
of screening for the desired property. 

[0122] Fig. 4 shows the application of this approach for evolving fish toward a larger size. Initially, a library is prepared 
40 of variants of a growth hormone gene. The variants can be natural or induced. The library is coated with recA protein 
and transfected into fertilized fish eggs. The fish eggs then mature into fish of different sizes. The growth hormone gene 
fragment of genomic DNA from large fish is then amplified by PGR and used in the next round of recombination. 
Alternatively, fish a-IFN is evolved to enhance resistance to viral infections as described below. 

45 3. Evolution of improved hormones for expression in transgenic animals (e.g., Fish) to create animals with improved traits. 

[0123] Hormones and cytokines are key regulators of size, body weight, viral resistance and many other commercially 
important traits. DNA shuffling is used to rapidly evolve the genes for these proteins using in vitro assays. This was 
demonstrated with the evolution of the human alpha interferon genes to have potent antiviral activity on murine cells. 

50 Large improvements in activity were achieved in two cycles of family shuffling of the human IFN genes. 

[0124] In general, a method of increasing resistance to virus infection in cells can be performed by first introducing a 
shuffled library comprising at least one shuffled interferon gene into animal cells to create an initial library of animal cells 
or animals. The initial library is then challenged with the virus. Animal cells or animals are selected from the initial library 
which are resistant to the virus and a plurality of transgenes from a plurality of animal cells or animals which are resistant 

55 to the virus are recovered. The plurality of transgenes is recovered to produce an evolved library of animal cells or 
animals which is again challenged with the virus. Cells or animals are selected from the evolved library the which are 
resistant to the virus. 

[0125] For example, genes evolved with in vitro assays are introduced into the germplasm of animals or plants to 
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create improved strains. One limitation of this procedure is that in vitro assays are often only crude predictors of in vivo 
activity. However, with improving methods for the production of transgenic plants and animals, one can now marry whole 
organism breeding with molecular breeding. The approach is to introduce shuffled libraries of hormone genes into the 
species of interest. This can be done with a single gene per transgenic or with pools of genes per transgenic. Progeny 

5 are then screened for the phenotype of interest. In this case, shuffled libraries of interferon genes (alpha IFN for example) 
are introduced into transgenic fish. The library of transgenic fish are challenged with a virus. The most resistant fish are 
identified (i.e. either survivors of a lethal challenge; or those that are deemed most Dhealthy* after the challenge). The 
IFN transgenes are recovered by PGR and shuffled in either a poolwise or a pairwise fashion. This generates an evolved 
library of IFN genes. A second library of transgenic fish is created and the process is repeated. In this way, IFN is evolved 

10 for improved antiviral activity in a whole organism assay: . . 

[0126] This procedure is general and can be applied to any trait that is affected by a gene or gene family of interest 
and which can be quantitatively measured, 

[0127] Fish interferon sequence data is available for the Japanese flatfish {Paralichthys olivaceus) as mRNA sequence 
(Tamai et al. (1993) "Cloning and expression of flatfish (Paralichthys olivaceus) interferon cDNA" Biochem. Biophys. 

15 Acta 1174, 182-186; see also, Tami et al. (1993) "Purification and characterization of interferon-ltke antiviral protein 
derived from flatfish {Paralichthys olivaceus) lymphocytes immortalized by oncogenes." Cytotechnology 1993; 1 1 (2): 
121-131). This sequence can be used to clone out IFN genes from this species. This sequence can also be used as a 
probe to clone homologous interferons from additional species of fish. As welt, additional sequence information can be 
utilized to clone out more species of fish interferons. Once a library of interferons has been cloned, these can be family 

20 shuffled to generate a library of variants. 

[0128] A Protein sequence of flatfish interferon is: 

MIRSTNSNKS DILMNCHHUIR YDDNSAPSGGSL FRKMIMLLKL LKLTTFGQLRW ELFVKSNTSKTS TVLSIDGSNLISL 
LDAPKDILDKPSCNSF QLDLLLASSAWTLLT ARLLNYPYPA VLLSAGVASVVLVQVP. 

[0129] In one embodiment, BHK-21 (A fibroblast cell line from hamster) can be transfected with the shuffled IFN- 
25 expression plasmids. Active recombinant IFN is produced and then purified by WGA agarose affinity chromatography 
(Tamai, et al. 1993 Biochim Ciophys Acta, supra). The antiviral activity of IFN can be measured on fish cells challenged 
by rhabdoviurs. Tami etal, (1993) "Purification and characterization of interferon-like antiviral protein derived from flatfish 
(Paralichthys olivaceus) lymphocytes immortalized by oncogenes." Cytotechnology 1993; 1 1 (2):121-131). 

30 H. WHOLE GENOME SHUFFLING IN HIGHER ORGANISMSPOOLWISE RECURSIVE BREEDING 

[0130] The present invention provides a procedure for generating large combinatorial libraries of higher eukaryotes, 
plants, fish, domesticated animals, etc. In addition to the procedures outlined above, poolwise combination of male and 
female gametes can also be used to generate large diverse molecular libraries. 
35 [0131] In one aspect, the proceiss includes recursive poolwise matings for several generations without any deliberate 
screening. This is similar to classical breeding, except that pools of organisms, rather than pairs of organisms, are mated, 
thereby accelerating the generation of genetic diversity. 

[0132] This method is similar to recursive fusion of a diverse population of bacterial protoplasts resulting in the gen- 
eration of mutttparent progeny harboring genetic information from all of the starting population of bacteria. The process 

40 described here is to perform analogous artificial or natural matings of large populations of natural isolates, imparting a 
split pool mating strategy; Before mating, all of the male gametes i.e. pollen, sperm, etc., are isolated from the starting 
population and pooled . These are then used to "self ' fertilize a mixed pool of the female gametes from the same population. 
[0133] The process is repeated with the subsequent progeny for several generations, with the final progeny being a 
combinatorial organism library with each member having genetic information originating from many if not all of the starting 

-^5 "parents." This process generates large diverse organism libraries on which many selections and or screens can be 
imparted, and it does not require sophisticated in vitro manipulation of genes. However, it results in the creation of useful 
new strains (perhaps well diluted in the population) in a much shorter time frame than such organisms could be generated 
using a classical targeted breeding approach. 

[0134] These libraries are generated relatively quickly (e.g., typically in less than three years for most plants of com- 
50 mercial interest, with six cycles or less of recursive breeding being sufficient to generate desired diversity). 

[01 35] An additional benefit of these methods is that the resulting libraries provide organismal diversity in areas, such 
as agriculture, aquaculture, and animal husbandry, that are currently genetically homogeneous. 
[0136] Examples of these methods for several organisms are describied below. 

55 1. Plants 

[0137] A population of plants, for example all of the different corn strains in a commercial seed/germplasm collection, 
are grown and the pollen from the entire population is harvested and pooled. This mixed pollen population is then used 
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to "seir fertilize the same population. Self pollination is prevented, so that the fertilization is combinatorial. The cross 
results in all pairwise crosses possible within the population, and the resulting seeds result in many of the possible 
outcomes of each of these pairwise crosses. The seeds from the fertilized plants are then harvested, pooled, planted, 
and the pollen is again harvested, pooled, and used to "self fertilize the population. After only several generations, the 

5 resulting population is a.very diverse combinatorial library of corn. The seeds from this library are harvested and screened 
for desirable traits, e.g., salt tolerance, growth rate, productivity, yield, disease resistance, etc. Essentially any plant 
collection can be modified by this approach. Important commercial crops include both monocots and dicots. Monocots 
include plants in the grass family (Gramineae), such as plants in the sub families Fefuco/cfeae and Poacoideae, which 
together include several hundred genera including plants in the genera Agrostis, Phleum, Dacty/is, Sorgum, Setaria, 

10 Zea (e.g., com), Oryza (e.g., rice), Triticum (e.g., wheat), Secale (e.g., rye), Avena (e.g., oats), Hordeum (e.g., barley), 
Saccharum, Poa, Festuca, Stenotaphrum. Cynodon, Coix, the Olyreae, Phareae and many others. Plants in the family 
Gramineae are a particularly preferred target plants for the methods of the invention. Additional preferred targets include 
other commercially important crops, e.g., from the families Compositae (the largest family of vascular plants, including 
at least 1 ,000 genera, including important commercial crops such as sunflower), and Leguminosae or "pea family," which 

15 includes several hundred genera, including many commercially valuable crops such as pea, beans, lentil, peanut, yam 
bean, cowpeas, velvet beans, soybean, clover, alfalfa, lupine, vetch, lotus, sweet clover, wisteria, and sweetpea. Common 
crops applicable to the methods of the invention include Zea mays, rice, soybean, sorghum, wheat, oats, barley, millet, 
sunflower, and canola. 

[01 38] This process can also be carried out using pollen from different species or more divergent strains (e.g., crossing 
20 the ancient grasses with corn). Different plant species can be forced to cross. Only a few plants from an initial cross 
would have to result in order to make the process viable. These few progeny, e.g., from a cross between soy bean and 
corn, would generate pollen and eggs, each of which would represent a different meiotic outcome from the recombination 
of the two genomes. The pollen would be harvested and used to "seir pollinate the original progeny. This process would 
then be carried out recursively. This would generate a large family shuffled library of two or more species, which could 
25 be subsequently screened. 

[0139] The above strategy is illustrated schematically in Figure 30. 

2. Fish 

30 [0140] The natural tendency of Hsh to lay their eggs outside of the body and to have a male cover those eggs with 
sperm provides another opportunity for a split pooled breeding strategy. The eggs from many different fish, e.g., salmon 
from different fisheries about the world, can be harvested, pooled, and then fertilized with similarly collected and pooled 
salmon sperm. The fertilization will result in alt of the possible painwise matings of the starting population. The resulting 
progeny is then grown and again the sperm and eggs are harvested, and pooled, with each egg and sperm representing 

35 a different meiotic outcome of the different crosses. The pooled sperm are then used to fertilize the pooled eggs and 
the process is carried out recursively. After several generations the resulting progeny can then be subjected to selections 
and screens for desired properties, such as size, disease resistance, etc. 
[0141] The above strategy is illustrated schematically in Figure 29. 

3. Animals 

[0142] The advent of in vitro fertilization and surrogate motherhood provides a means of whole genome shuffling in 
animals such as mammals. As with fish, the eggs and the sperm from a population, for example from all slaughter cows, 
are collected and pooled. The pooled eggs are then in vitro fertilized with the pooled sperm. The resulting embryos are 
-^5 then returned to surrogate mothers for development. As above, this process is repeated recursively until a large diverse 
population is generated that can be screened for desirable traits. 

[0143] A technically feasible approach would be similar to that used for plants. In this case, sperm from the males of 
the starting population is collected and pooled, and then this pooled sample is used to artificially inseminate multiple 
females from each of the starting populations. Only one (or a few) sperm would succeed in each animal, but these should 
50 be different for each fertilization. The process is reiterated by harvesting the sperm from all of the male progeny, pooling 
it, and using it to fertilize all of the female progeny. The process is carried out recursively for several generations to 
generate the organism library, which can then be screened. 

I. RAPID EVOLUTION AS A PREDICTIVE TOOL 

55 

[0144] Recursive sequence recombination can be used to simulate natural evolution of pathogenic microorganisms 
in response to exposure to a drug under test. Using recursive sequence recombination, evolution proceeds at a faster 
rate than in natural evolution. One measure of the rate of evolution is the number of cycles of recombination and screening 
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required until the microorganism acquires a defined level of resistance to the drug. The information from this analysis 
is of value in comparing the relative merits of different drugs and in particular, in predicting their long term efficacy on 
repeated administration. 

[0145] The pathogenic microorganisms used in this analysis include the bacteria that are a common source of human 

5 infections, such as chlamydia, rickettsial bacteria, mycobacteria, staphylococci, streptocci, pneumor)ococcl, menmgo- 
cocci and conococci, klebsiella, proteus, serratia, pseudomonas, legionella, diphtheria, salmonella, bacilli, cholera, 
tetanus, botulism, anthrax, plague, leptospirosis, and Lymes disease bacteria. Evolution is effected by transforming an 
isolate of bacteria that is sensitive to a drug under test with a library of DNA fragments. The fragments can be a mutated 
version of the genome of the bacteria being evolved. If the target of the drug is a known protein or nucleic acid, a focused 

10 library containing variants of the corresponding gene can be used. Alternatively, the library can come from other kinds 
of bacterid, especially bacteria typically found inhabiting human tissues, thereby simulating the source material available 
for recombination in vivo. The library can also come from bacteria known to be resistant to the drug. After transformation 
and propagation of bacteria for an appropriate period to allow for recombination to occur and recombinant genes to be 
expressed, the bacteria are screened by exposing them to the drug under test and then collecting survivors. Surviving 

f5 bacteria are subject to further rounds of recombination. The subsequent round can be effected by a split and pool 
approach in which DNA from one subset of surviving bacteria is introduced into a second subset of bacteria. Alternatively, 
a fresh library of DNA fragments can be introduced into surviving bacteria. Subsequent round(s) of selection can be 
performed at Increasing concentrations of drug, thereby increasing the stringehcy of selection. 
[0146] A similar strategy can be used to simulate viral acquisition of drug resistance. The object is to identify drugs 

20 for which resistance can be acquired only slowty, if at all. The viruses to be evolved are those that cause infections in 
humans for which at least modestly effective drugs are available. Substrates for recombination can come from induced 
mutants, natural variants of the same viral strain or different viruses. If the target of the drug is known (e.g., nucleotide 
analogs which inhibit the reverse transcriptase gene of HIV), focused libraries containing variants of the target gene can 
be produced. Recombination of a viral genome with a library of fragments is usually performed in vitro. However, in 

25 situations in which the library of fragments constitutes variants of virat genomes or fragments that can be encompassed 
in such genomes, recombination can also be performed in vivo, e.g., by transfecting ceils with multiple substrate copies 
(see Section V). For screening, recombinant viral genomes are introduced into host cells susceptible to infection by the 
virus and the cells are exposed to a drug effective against the virus (initially at low concentration). The cells can be spun 
to remove any noninfected virus. After a period of infection, progeny viruses can be collected from the culture medium, 

30 the progeny viruses being enriched for viruses that have acquired at (east partial resistance to the drug. Alternatively, 
virally infected cells can be plated in a soft agar lawn and resistant viruses isolated from plaques. Plaque size provides 
some indication of the degree of viral resistance. 

[0147] Progeny viruses surviving screening are subject to additional rounds of recombination and screening at in- 
creased stringency until a predetermined level of drug resistance has been acquired. The predetermined level of drug 
35 resistance may reflect the maximum dosage of a drug practical to administer to a patient without intolerable side effects. 
The analysis is particularty valuable for investigating acquisition of resistance.to various.^pn[)bination of dmgs, such as 
'\ the growing list of approved anti-HIV drugs (e.g., AZT, ddl, d8c ' cf4T, TIBO 821 50, nevaripine, 3TC, crixlvan and ritonavir), 

J. THE EVOLUTIONARY IMPORTANCE OF RECOMBINATION 

40 

[0148] Strain improvement is the directed evolution of an organism to be more Tit" for a desired task. In nature, 
adaptation is facilitated by sexual recombination. Sexual recombination allows a population to exploit the genetic diversity 
within it, e.g., by consolidating useful mutations and discarding deleterious ones. In this way, adaptation and evolution 
can proceed in leaps. In the absence of a sexual cycle, members of a population must evolve independently by accu- 
45 mulating random mutations sequentially. Many useful mutations are lost while deleterious mutations can accumulate. 
Adaptation and evolution in this way proceeds slowly as compared to sexual evolution. 

[0149] As shown in Fig. 1 7, asexual evolution is a slow and inefficient process. Populations move as individuals rather 
than as groups. A diverse population is generated by the mutagenesis of a single parent resulting in a distribution of fit 
and unfit individuals. In the absence of a sexual cycle, each piece of genetic information of the surviving population 

50 remains in the individual mutants. Selection of the "fittesf* results in many "fit" individuals being discarded along with 
the useful genetic information they carry. Asexual evolution proceeds one genetic event at a time and Is thus limited by 
the intrinsic value of a single genetic event. Sexual evolution moves more quickly and efficiently. Mating within a population 
consolidates genetic information within the population and results in useful mutations being combined together. The 
combining of useful genetic information results in progeny that are much more fit than their parents. Sexual evolution 

55 thus proceeds much faster by multiple genetic events. 

[0150] Years of plant and animal breeding has demonstrated the power of employing sexual recombination to effect 
the rapid evolution of complex genomes towards a particular task. This general principle is further demonstrated by 
using DNA shuffling to recombine DNA molebules in vitro io accelerate the rate of directed molecular evolution. The 
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strain improvement efforts of the fermentation industry rely on the directed evolution of microorganisms by sequential 
random mutagenesis. Incorporation of recombination into this iterative process greatly accelerates the strain improvement 
process, which in turn increases the profitability of current fermentation processes and facilitates the development of 
new products. 

5 

K. DNA SHUFFLING VS NATURAL RECOMBINATION - THE UTILITY OF POOLWISE RECOMBINATION. 

[0151] DNA shuffling includes the recursive recombination of DNA sequences, A significant difference between DNA 
shuffling and natural sexual recombination is that DNA shuffling can produce DNA sequences originating from multiple 
10 parental sequences while sexual recombination produces DNA sequences originating from only two parental sequences 
(Fig. 25). . 

[0152] As shown in figure 25, the rate of evolution is in part limited by the number of useful mutations that a member 
of a population can accumulate between selection events. In sequential random mutagenesis, useful mutations are 
accumulated one per selection event Many useful mutations are discarded each cycle in favor of the best performer, 

15 and neutral or deleterious mutations which survive are as difficult to lose as they were to gain and thus accumulate. In 
sexual evolution pairwise recombination allows mutations from two different parents to segregate and recombine in 
different combinations. Useful mutations can accumulate and deleterious mutations can be losL Poolwsie recombination, 
such as that effected by DNA shuffling, has the same advantages as painwise recombination but allows mutations from 
many parents to consolidate into a single progeny. Thus poolwise recombination provides a means for increasing the 

20 number of useful mutations that can accumulate each selection event. The graph in Fig. 25 shows a plot of the potential 
number of mutations an individual can accumulate by each of these processes. Recombination is exponentially superior 
to sequential random mutagenesis, and this advantage increases exponentially with the number of parents that can 
recombine. Sexual recombination is thus more conservative. In nature, the pairwise nature of sexual recombination may 
provide important stability within a population by impeding the large changes in DNA sequence that can result from 

25 poolwise recombination. For the purposes of directed evolution, however, poolwise recombination is more efficient. 
[01 53] The potential diversity that can be generated from a population is greater as a result of poolwise recombination 
as compared to that resulting from pairwise recombination. Further, poolwise recombination enables the combining of 
multiple beneficial mutations originating from multiple parental sequences. 

[0154] To demonstrate the importance of poolwise recombination vs pairwise recombination in the generation of 
30 molecular diversity consider the breeding of ten independent DNA sequences each containing only one unique mutation. 
There are 2'^^- 1024 different combinations of those ten mutations ranging from a single sequence having no mutations 
(the consensus) to that having all ten mutations. If this pool were recombined together by pairwise recombination, a 
population containing the consensus, the parents, and the 45 different combinations of any two of the mutations would 
result in 56 or ca. 5% of the possible 1024 mutant combinations. Alternatively, if the pool were recombined together in 
35 a poolwise fashion, alt 1024 would be theoretically generated, resulting in an approximately 20 fold increase in library 
diversity. When looking for a unique solution to a problem in molecular evolution, the more complex the library, the more 
complex the possible solution. Indeed, the most fit member of a shuffled library often contains several mutations originating 
from several independent starting sequences. 

40 1. DNA Shuffling Provides Recursive Pairwise Recombination 

[0155] In vitro DNA shuffling results in the efficient production of combinatorial genetic libraries by catalyzing the 
recombination of multiple DNA sequences. While the result of DNA shuffling is a population representing the poolwise 
recombination of multiple sequences, the process does not rely on the recombination of multiple DNA sequences si- 

45 multaneously, but rather on their recursive pairwise recombination. The assembly of complete genes from a mixed pool 
of small gene fragments requires multiple annealing and elongation cycles, the thermal cycles of the primeriess PGR 
reaction. During each thermal cycle many pairs of fragments anneal and are extended to form a combinatorial population 
of larger chimeric DNA fragments. After the first cycle of reassembly, chimeric fragments contain sequence originating 
from predominantly two different parent genes, with all possible pairs of "parental" sequence theoretically represented. 

50 This is similar to the result of a single sexual cycle within a population. During the second cycle, these chimeric fragments 
anneal with each other or with other small fragments, resulting in chimeras originating from up to four of the different 
starting sequences, again with all possible combinations of the four parental sequences theoretically represented. This 
second cycle is analogous to the entire population resulting from a single sexual cross, both parents and offspring, 
inbreeding. 

55 [0156] Further cycles result in chimeras originating from 8, 16, 32, etc parental sequences and are analogous to further 
inbreedings of the preceding population. This could be considered similar to the diversity generated from a small pop- 
ulation of birds that are isolated on an island, breeding with each other for many generations. The result mimics the 
outcome of "poolwise" recombination, but the path is via recursive pairwise recombination. For this reason, the DNA 
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molecules generated from in vitro DNA shuffling are not the "progeny"'of the starting "parental" sequences, but rather 
the great, great ,great, greats,., (n = numbei- of thermal cycles) grand progeny of the starting "ancestor" molecules. 

L. FERMENTATION 

5 ' 

[01 57] The fermentation of microorganisms for the production of natural products is the oldest and most sophisticated 
application of biocatalysis. Industrial microorganisms effect the multistep conversion of renewable feedstocks to high 
value chemical products in a single reactor and in so doing catalyze a rriulti-billion dollar industry. Fermentation products 
range from fine and commodity chemicals such as ethanol, tactic acid, amino acids and vitamins, to high value small 

10 molecule pharmaceuticals, protein pharmaceuticals, and industrial enzymes. See, e.g., McCoy (1998) C&EN 13-19) for 
an introduction to biocatalysis. 

[0158] Success in bringing these products to market and success in competing in the market depends on continuous 
improvement of the whole cell blocatalysts. Improvements include increased yield of desired products, removal of un- 
wanted co-metabolites, improved utilization of inexpensive carbon and nitrogen sources, and adaptation to fermenter 

15 conditions, increased production of a primary metabolite, increased production of a secondary metabolite, increased 
tolerance to acidic conditions, increased tolerance to basic conditions, increased tolerance to organic solvents, increased 
tolerance to high salt conditions and increased tolerance to high or low temperatures. Shortcomings in any of these 
areas can result in high manufacturing costs, inability to,capture or maintain market share, and failure of bringing promising 
products to market. For this reason, the fermentation industry invests significant financial and personnel resources in 

20 the improvement of production strains. 

[0159] Cunrent strategies for strain improvement rely on the empirical and iterative modification of fermenter conditions 
and genetic manipulation of the producing organism. While advances in the molecular biology of established industrial 
organisms have been made, rational metabolic engineering is information intensive and Is not broadly applicable to less 
characterized industrial strains. The most widely practiced strategy for strain Improvement employs random mutagenesis 

25 of the producing strain and screening for mutants having improved properties. For mature strains, those subjected to 
many rounds of improvement, these efforts routinely provide a 10% increase in product titre per year. Although effective, 
this classic strategy is slow, laborious, and expensive. Technological advances in this area are aimed at automation 
and increasing sample screening throughput in hopes of reducing the cost of strain improvement. However, the real 
technical barrier resides in the intrinsic limitation of single mutations to effect significant strain improvement. The methods 

30 herein overcome this limitation and provide access to multiple useful mutations per cycle which can be used to complement 
automation technologies and catalyze strain improvement processes. 

[0160] The methods herein allow blocatalysts to be improved at a faster pace than conventional methods. Whole 
genome shuffling can at least double the rate of strain improvement for microorganisms used in fermentation as compared 
to traditional methods. This provides for a relative decrease in the cost of fermentation processes. New products can 

35 enter the market sooner, producers can increase profits as well as market share, and consumers gain access to more 
products of higher quality and at lower prices. Further, increased efficiency of production processes translates to less 
waste production and more fiugal use of resources. Whole genome shuffling provides a means of accumulating multiple 
useful mutation per cycle and thus eliminate the inherent limitation of current strain improvement programs (SIPs). 
[0161] DNA shuffling provides recursive mutagenesis, recombination, and selection of DNA sequences. A key differ- 

^0 ence between DNA shuffling-mediated recombination and natural sexual recombination is that DNA shuffling effects 
both the pairwise (two parents) and the poolwise (multiple parents) recombination of parent molecules, as described 
supra. Natural recombination is more conservative and is limited to pairwise recombination. In nature, pairwise recom- 
bination provides stability within a population by preventing large leaps in sequences or genomic structure that can result 
from poolwise recombination. However, for the purposes of directed evolution, poolwise recombination is appealing 

^5 since the beneficial mutations of multiple parents can be combined during a single cross to produce a superior offspring. 
Poolwise recombination is analogous to the crossbreeding of inbred strains in classic strain improvement, except that 
the crosses occur between many strains at once. In essence, poolwise recombination is a sequence of events that 
effects the recombination of a population of nucleic acid sequences that results in the generation of new nucleic acids 
that contains genetic information from more than two of the original nucleic acids. The power of m vitro DNA shuffling 

50 is that large combinatorial libraries can be generated from a small pool of DNA fragments reassembled by recursive 
pairwise annealing and extension reactions, "matings." Many of the /n vivo recombination formats described (such as 
plasmid-plasmid, plasmid-chromosome, phage-phage, phage-chromosome, phage-plasmid, conjugal DNA-chromo- 
some, exogenous DNA-chromosome, chromosome-chromosome, with the DNA being introduced into the cell by natural 
and non-natural competence, transduction, transfection, conjugation, protoplast fusion, etc.) result primarily in the pair- 

55 wise recombination of two DNA molecules. Thus, these formats when executed for only a single cycle of recombination 
are inherently limited in their potential to generate molecular diversity. To generate the level of diversity obtained by /n 
vitro DNA shuffling methods, pairwise mating formats must be carried out recursively, i.e for many generations, prior to 
screening for improved sequences. Thus a pool of DNA sequences, such as four independent chromosomes, must be 
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recombined, for example by protoplast fusion, and the progeny of that recombination (each representing a unique 
outcome of the pairwise mating) must then be pooled, without selection, and then recombined again, and again, and 
again. This process should be repeated for a sufficient number of cycles to result in progeny having the desired complexity. 
Only once sufficient diversity has been generated, should the resulting population be screened for new and improved 
5 sequences. 

[0162] There are a few general methods for effecting efficient recombination in prokaryotes. Bacteria have no known 
sexual cycle perse, but there are natural mechanisms by which the genomes of these organisms undergo recombination. 
These mechanisms include natural competence, phage-mediated transduction, and cell-cell conjugation. Bacteria that 
are naturally competent are capable of efficiently taking up naked DNA from the environment. If homologous, this DNA 

10 undergoes recombination with the genome of the cell, resulting in genetic exchange. Bacillus subtilis, the primary pro- 
duction organism of the enzyme industry, is known for the efficiency with which it carries out this process. 
[01 63] In generalized transduction, a bacteriophage mediates genetic exchange. A transducing phage will often pack- 
age headfulls of the host genome. These phage can infect a new host and deliver a fragment of the former host genome 
which is frequently integrated via homologous recombination. Cells can also transfer DNA between themselves by 

15 conjugation. Cells containing the appropriate mating factors transfer episomes as well as entire chromosomes to an 
appropriate acceptor cell where it can recombine with the acceptor genome. Conjugation resembles sexual recombination 
for microbes and can be intraspecific, interspecific, and intergeneric. For example, an efficient means of transforming 
Streptomycessjp., a genera responsible for producing many commercial antibiotics, is by the conjugal transfer of plasmids 
from Echerichia coll. 

20 [0164] For many industrial microorganisms, knowledge of competence, transducing phage, or fertility factors is lacking. 
Protoplast fusion has been developed as a versatile and general alternative to these natural methods of recombination. 
Protoplasts are prepared by removing the cell watt by treating cells with lytic enzymes in the presence of osmotic 
stabilizers. In the presence of a fusogenic agent, such as polyethylene glycol (PEG), protoplasts are Induced to fuse 
and form transient hybrids or "fusants." During this hybrid state, genetic recombination occurs at high frequency allowing 

25 the genomes to reassert. The final crucial step is the successful segregation and regeneration of viable cells from the 
fused protoplasts. Protoplast fusion can be intraspecific, interspecific, and intergeneric and has been applied to both 
prokaryotes and eukaryotes. In addition, It is possible to fuse more than two cells, thus providing a mechanism for 
effecting poolwise recombination. White no fertility factors, transducing phages or competency development is needed 
for protoplast fusion, a method for the formation, fusing, and regeneration of protoplasts is typically optimized for each 

30 organism. Protoplast fusion as applied to poolwise recombination is described in more detail, supra, 

[0165] One key to SIP is having an assay that can be dependably used to identify a few mutants out of thousands 
that have subtle increases in product yield. The limiting factor in many assay formats is the uniformity of cell growth. 
This variation is the source of baseline variability in subsequent assays. Inoculum size and culture environment (tem- 
perature/humidity) are sources of cell growth variation. Automation oil all aspects of establishing initial cultures and state- 

35 of-the-art temperature and humidity controlled incubators are useful in reducing variability. 

[0166] Mutant cells or spores are separated on solid media to produce individual sporulating colonies. Using an 
automated colony picker (Q-bot, Genetix, U.K.), colonies are identified, picked, and 10,000 different mutants inoculated 
into 96 well microtitre dishes containing two 3 mm glass balls/well. The Q-bot does not pick an entire colony but rather 
inserts a pin through the center of the colony and exits with a small sampling of cells (or mycelia) and spores. The time 

40 the pin is in the colony, the number of dips to inoculate the culture medium, and the time the pin is in that medium each 
effect inoculum size, and each can be controlled and optimized. The uniform process of the Q-bot decreases human 
handling error and increases the rate of establishing cultures (roughly 10,000/4 hours). These cultures are then shaken 
in a temperature and humidity controlled incubator. The glass balls act to promote uniform aeration of cells and the 
dispersal of mycelial fragments similar to the blades of a fermenter. An embodiment of this procedure is further illustrated 

45 in Fig. 28, including an integrated system for the assay. 

1 . Prescreen 

[0167] The ability to detect a subtle increase in the performance of a mutant over that of a parent strain relies on the 
50 sensitivity of the assay. The chance of finding the organisms having an improvement is increased by the number of 
individual mutants that can be screened by the assay. To increase the chances of identifying a pool of sufficient size a 
prescreen that increases the number of mutants processed by 10-fo!d can be used. The goal of the primary screen will 
be to quickly identify mutants having equal or better product titres than the parent strain(s) and to move only these 
mutants forward to liquid cell culture. 
55 [0168] The primary screen is an agar plate screen is analyzed by the Q-bot colony picker. Although assays can be 
fundamentally different, many result, e.g., in the production of colony halos. For example, antibiotic production Is assayed 
on plates using an overiay of a sensitive indicator strain, such as S. subtilis. Antibiotic production is typically assayed 
as a zone of clearing (inhibited growth of the indicator organism) around the producing organism. Similarly, enzyme 
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production can be assayed on plates containing the enzyme substrate, with activity being detected as a zone of substrate 
modification around the producing colony. Product titre is correlated with the ratio of halo area to colony area. 
[0169] The Q-bot or other automated system is instructed to only pick colonies having a halo ratio in the top 10% of 
the population i.e. 1 0,000 mutants from the 1 00.000 entering the plate prescreen. This increases the number of improved 
5 clones in the secondary assay and eliminates the wasted effort of screening knock-out and low producers. This improves 
the "hit rate" of the secondary assay. 

M. PROMOTION OF GENETIC EXCHANGE 

1 ■ Geneiral 

[0170] Some methods of the invention effect recombination of cellular DNA by propagating cells under conditions 
inducing exchange of DNA between cells. DNA exchange can be promoted by generally applicable methods such as 
electroporation, biolistics, cell fusion, or in some instances, by conjugation, transduction, or agrobacterium mediated 
15 transfer and meiosis. For example, Agrobacterium can transform S. cerews/ae with T-DNA, which is incorporated into 
the yeast genome by both homologous recombination and a gap repair mechanism. (Piers et al., Proc. Natl. Acad Sci. 
US>\93(4), 1613-8 (1996)). 

[0171] In some methods, initial diversity between cells (i.e., before genome. exchange) is Induced by chemical or 
radiation-induced mutagenesis of a progenitor cell type, optionally followed by screening for a desired phenotype. In 

20 other methods, diversity is natural as where cells are obtained from different individuals, strains or species. 

[0172] In some shuffling methods, induced exchange of DNA is used as the sole means of effecting recombination in 
each cycle of recombination. In other methods, induced exchange is used in combination with natural sexual recombi- 
nation of an organism. In other methods, induced exchange and/or natural sexual recombination are used in combination 
with the introduction of a fragment library. Such a fragment library can be a whole genome, a whole chromosome, a 

25 group of functionally or genetically linked genes, a plasmid, a cosmid, a mitochondrial genome, a viral genome (replicative 
and nonreplicative) or specific or random fragments of any of these. The DNA can be linked to a vector or can be in free 
form. Some vectors contain sequences promoting homologous or nonhomologous recombination with the host genome. 
Some fragments contain double stranded breaks such as caused by shearing with glass beads, sonication, or chemical 
or enzymatic fragmentation, to stimulate recombination. 

30 [0173] In each case, DNA can be exchanged between cells after which it can undergo recombination to form hybrid 
genomes. Generally, cells are recursively subject to recombination to increase the diversity of the population prior to 
screening. Cells bearing hybrid genomes, e.g., generated after at least one, and usually several cycles of recombination 
are screened for a desired phenotype, and cells having this phenotype are isolated. These cells can additionally form 
starting materials for additional cycles of recombination in a recursive recombination/selection scheme. 

36 [0174] One means of promoting exchange of DNA between cells is by fusion of cells, such as by protoplast fusion. A 
protoplast results from the removal from a cell of its cell wall, leaving a membrane-bound cell that depends on an isotonic 
or hypertonic medium for maintaining its integrity. If the cell wall is partially removed, the resulting cell is strictly referred 
to as a spheroplasl and if it is completely removed, as a protoplast. However, here the term protoplast includes sphe- 
roplasts unless otherwise indicated. 

40 [0175] Protoplast fusion is described by Shaffner etal., Proc, Natl. Acad Sci USA77\ 2163 (1980) and other exemplary 
procedures are described by Yoakum et at., US 4,608,339, Takahashi et aK, US 4,677,066 and Sambrooke et al.. at 
Ch. 16. Protoplast fusion has been reported between strains, species, and genera (e.g., yeast and chicken erythrocyte). 
[0176] Protoplasts can be prepared for both bacterial and eukaryotic cells, including mammalian celts and plant cells, 
by several means including chemical treatment to strip cell walls. For example, cell walls can be stripped by digestion 

45 with a cell wall degrading enzyme such as lysozyme in a 10-20% sucrose. 50 mM EDTA buffer. Conversion of cells to 
spherical protoplasts can be monitored by phase-contrast microscopy. Protoplasts can also be prepared by propagation 
of cells In media supplemented with an inhibitor of cell wail synthesis, or use of mutant strains lacking capacity for cell 
wall formation. Preferably, eukaryotic cells are synchronized in G1 phase by arrest with inhibitors such asa-factor, K. 
lactis killer toxin, leflonamide and adenylate cyclase inhibitors. Optionally, sortie but not all, protoplasts to be fused can 

50 be killed and/or have their DNA fragmented by treatment with ultraviolet irradiation, hydroxylamine or cupferon (Reeves 
et al., FEMS Microbiol. Lett, 99, 193-198 (1992)). In this situation, killed protoplasts are referred to as donors, and viable 
protoplasts as acceptors. Using dead donors cells can be advantageous in subsequently recognizing fused cells with 
hybrid genomes, as described below. Further, breaking up DNA in donor cells is advantageous for stimulating recom- 
bination with acceptor DNA. Optionally, acceptor and/or fused cells can also be briefly, but nonlethally, exposed to UV 

55 irradiation further to stimulate recombination. 

[0177] Once formed, protoplasts can be stabilized in a variety of osmolytes and compounds such as sodium chloride, 
potassium chloride, sodium phosphate, potassium phosphate, sucrose, sorbitol in the presence of DTT. The combination 
of buffer, pH, reducing agent, and osmotic stabilizer can be optimized for different cell types. Protoplasts can be induced 
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to fuse by treatment with a chemical such as PEG, calcium chloride or calcium propionate or electrofusion (Tsoneva, 
Acta Microbiologica Bulgaria 24; 53-59 (1 989)). A method of cell fusion employing electric fields has also been described. 
See Chang US, 4,970,154. Conditions can be optimized for different strains. 

[0178] The fused cells are heterokaryons containing genomes from two or more component protoplasts. Fused cells 
5 can be enriched from unfused parental cells by sucrose gradient sedimentation or cell sorting. The two nuclei in the 
heterokaryons can fuse (karyogamy) and homologous recombination can occur between the genomes. The chromo- 
somes can also segregate asymmetrically resulting in regenerated protoplasts that have lost or gained whole chromo- 
somes. The frequency of recombination can be increased by treatment with ultraviolet irradiation or by use of strains 
overexpressing recA or other recombination genes, or the yeast rad genes, and cognate variants thereof in other species, 
10 or by the inhibition of gene products of MutS, MuL; or MuiD. Overexpression can be either the result of introduction of 
exogenous recombination genes or the result of selecting strains, which as a result of natural variation or induced 
mutation, overexpress endogenous recombination genes. The fused protoplasts are propagated under conditions al- 
lowing regeneration of cell walls, recombination and segregation of recombinant genomes into progeny cells from the 
heterokaryon and expression of recombinant genes: This process can be reiteratively repeated to increase the diversity 
15 of any set of protoplasts or cells. After, or occasiorialiy before or during, recovery of fused cells^ the cells are screened 
or selected for evolution toward a desired property. 

[0179] Thereafter a subsequent round of recombination can be performed by preparing protoplasts from the cells 
surviving selection/screening in a previous round, the protoplasts are fused, recombination occurs in fused protoplasts, 
and cells are regenerated from the fused protoplasts. This process can again be reiteratively repeated to increase the 
20 diversity of the starting population. Protoplasts, regenerated or regenerating cells are subject to further selection or 
screening. 

[0180] Subsequent rounds of recombination can be performed on a split pool basis as described above. That is, a 
first subpopulatton of cells surviving selection/screening from a previous round are used for protoplast formation. A 
second subpopulation of cells surviving selection/screening from a previous round are used as a source for DNA library 

25 preparation. The DNA library from the second subpopulation of cells is then transformed into the protoplasts from the 
first subpopulation. The library undergoes recombination with the genomes of the protoplasts to form recombinant 
genomes. This process can be repeated several times in the absence of a selection event to increase the diversity of 
the cell population. Cells are regenerated from protoplasts, and selection/screening is applied to regenerating or regen- 
erated cells. In a further variation, a fresh library of nucleic acid fragments is introduced into protoplasts surviving selection/ 

30 screening from a previous round. 

[0181] An exemplary format for shuffling using protoplast fusion is shown in Fig. 5, The figure shows the following 
steps: protoplast formation of donor and recipient strains, heterokaryon formation, karyogamy, recombination, and seg- 
regation of recombinant genomes into separate cells. Optionally, the recombinant genomes, if having a sexual cycle, 
can undergo further recombination with each other as a result of meiosis and mating. Recursive cycles of protoplast 

35 fusion, or recursive mating/meiosts is often used to increase the diversity of a cell population. After achieving a sufficiently 
diverse population via one of these forms of recombination, cells are screened or selected for a desired property. Cells 
surviving selection/screening can then used as the starting materials in a further cycle of protoplasting or other recom- 
bination methods as noted herein. 

"fO 2. Selection For Hybrid Strains 

[01 82] The invention provides selection strategies to identify cells formed by fusion of components from parental cells 
from two or more distinct subpopulations. Selection for hybrid cells is usually performed before selecting or screening 
for cells that have evolved (as a result of genetic exchange) to acquisition of a desired property. A basic premise of most 
45 such selection schemes is that two initial subpopulations have two distinct markers. Cells with hybrid genomes can thus 
be identified by selection for both markers. 

[01 83] In one such scheme, at least one subpopulation of cells bears a selective marker attached to its cell membrane. 
Examples of suitable membrane markers include btotin, fluorescein and rhodamine. The markers can be linked to amide 
or thiol groups or through more specific derivatization chemistries, such as iodo-acetates, iodoacetamides, maleimides. 

50 For example, a marker can be attached as follows. Cells or protoplasts are washed with a buffer (e.g., PBS), which does 
not interfere with the chemical coupling of a chemically active ligand which reacts with amino groups of lysines or N- 
terminal aminogroups of membrane proteins. The ligand is either amine reactive itself (e.g., isothiocyanates, succinimidyl 
esters, sulfonyl chlorides) or is activated by a heterobifunctional linker (e.g. EMCS, SIAB, SPDP, SMB) to become amine 
reactive. The ligand is a molecule which is easily bound by protein derivatized magnetic beads or other capturing solid 

55 supports. For example, the ligand can be succinimidyl activated biolin (Molecular Probes Inc.: B-1606, B-2603, S-1515, 
S-1582). This linker is reacted with aminogroups of proteins residing in and on the surface of a cell. The cells are then 
washed to remove excess labelling agent before contacting with cells from the second subpopulation bearing a second 
selective marker. 



27 



EP 1 707 641 A2 



[01 84] The second subpopulation of cells can also bear a membrane marker, albeit a different membrane marker from 
the first subpopulation. Alternatively, the second subpopulation can bear a genetic marker. The genetic marker can 
confer a selective property such as drug resistance or a screenable property, such as expression of green fluorescent 
protein. 

s [0185] After fusion of first and second subpopulations of cells and recovery, cells are screened or selected for the 
presence of markers on both parental subpopulations. For example, fusants are enriched for one population by adsorbtion 
to specific beads and these are then sorted by FACS for those expressing a marker. Cells surviving both screens for 
both markers are those having undergone protoplast fusion, and are therefore more likely to have recombined genomes. 
Usually, the markers are screened or selected separately.. Membrane-bound markers, such as biotin, can be screened 

10 by affinity enrichment for the cell membrane marker (e.g., by panning fused cells on an affinity matrix). For example, for 
a biotin membrane label, cells can be affinity purified using streptavidin-coated magnetic beads (Dynal). These beads 
are v^^ashed several times to remove the non-fused host cells. Alternatively, cells can be panned against an antibody to 
the membrane marker. In a further variation, if the membrane marker is fluorescent, cells bearing the marker can be 
identified by FACS. Screens for genetic markers depend on the nature of the markers, and include capacity to grow on 

15 drug-treated media or FACS selection for green fluorescent protein. If first and second cell populations have fluorescent 
markers of different wavelengths, both markers can be screened simultaneously by FACS sorting. 
[0186] In a further selection scheme for hybrid cells, first and second populations of cells to be fused express different 
subunits of a heteromultimeric enzyme. Usually, the heteromultimeric enzyme has two different subunits, but heterom- 
ultimeric enzymes having three, four or more different subunits can be used. If an enzyme has more than two different 

20 subunits, each subunit can be expressed in a different subpopulation of cells (e.g., three subunits in three subpopulations), 
or more than one subunit can be expressed in the same subpopulation of cells (e.g., one subunit in one subpopulation, 
two subunits In a second subpopulation). In the case where more than two subunits are used, selection for the poolwise 
recombination of more than two protoplasts can be achieved. 

[0187] Hybrid cells representing a combination of genomes of first, second or more subpopulation component cells 
25 can then be recognized by an assay for intact enzyme. Such an assay can be a binding assay, but is more typically a 
functional assay (e.g., capacity to metabolize a substrate of the enzyme). Enzymatic activity can be detected for example 
by processing of a substrate to a product with a fluorescent or otherwise easily detectable absorbance or emission 
spectrum. The individual subunits of a heteromultimeric enzyme used in such an assay preferably have no enzymic 
activity in dissociated form, or at least have significantly less activity in dissociated form than associated form. Preferably, 
30 the cells used for fusion lack an endogenous form of the heteromultimeric enzyme, or at least have significantly less 
endogenous activity than results from heteromultimeric enzyme formed by fusion of cells. 

[0188] Penicillin acylase enzymes, cephalosporin acylase and penicillin acyltransferase are examples of suitable 
heteromultimeric enzymes. These enzymes are encoded by a single gene, which is translated as a proenzyme and 
cleaved by posttranslational autocatalytic proteolysis to remove a spacer endopeptide and generate two subunits, which 
35 associate to form the active heterodimeric enzyme. Neither subunit is active in the absence of the other subunit. However, 
activity can be reconstituted if these separated gene portions are expressed in the same cell by co-transformation. Other 
enzymes that can be used have subunits that are encoded by distinct genes (e.g., faoA and faoB genes encode 3- 
oxoacyl-CoA thiolase of Pseudonmonas fragi {Biochem. J 328, 815-820 (1997)). 

[0189] An exemplary enzyme is penicillin G acylase from Escherichia coli, which has two subunits encoded by a single 

^0 gene. Fragments of the gene encoding the two subunits operably linked to appropriate expression regulation sequences 
are Iransfected into first and second subpopulations of cells, which lack endogenous penicillin acylase activity. A cell 
formed by fusion of component cells from the first and second subpopulations expresses the two subunits, which assemble 
to fomn functional enzyme, e.g., penicillin acylase. Fused cells can then be selected on agar plates containing penicillin 
G, which is degraded by penicillin acylase. 

45 [0190] In another variation, fused cells are identified by complementation of auxotrophic mutants. Parental subpopu- 
lations of cells can be selected for known auxotrophic mutations. Alternatively, auxotrophic mutations in a starting 
population of cells can be generated spontaneously by exposure to a mutagenic agent. Cells with auxotrophic mutations 
are selected by replica plating on minimal and complete media. Lesions resulting in auxotrophy are expected to be 
scattered throughout the genome, in genes for amino acid, nucleotide, and vitamin biosynthetic pathways. After fusion 

50 of parental cells, cells resulting from fusion can be identified by their capacity to grow on minimal media. These cells 
can then be screened or selected for evolution toward a desired property. Further steps of mutagenesis generating fresh 
auxotrophic mutations can be incorporated in subsequent cycles of recombination and screening/selection. 
[0191] In variations of the above method, de novo generation of auxotrophic mutations in each round of shuffling can 
be avoided by reusing the same auxotrophs. For example, auxotrophs can be generated by transposon mutagenesis 

55 using a transposon bearing selective marker. Auxotrophs are identified by a screen such as replica plating. Auxotrophs 
are pooled, and a generalized transducing phage lysate is prepared by growth of phage on a population of auxotrophic 
cells. A separate population of auxtrophic cells is subjected to genetic exchange, and complementation is used to selected 
cells that have undergone genetic exchange and recombination. These cells are then screened or selected for acquisition 
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of a desired property. Cells surviving screening or selection then have auxotrophic markers regenerated by introduction 
of the transducing transposon library. The newly generated auxotrophic cells can then be subject to further genetic 
exchange and screening/selection. 

[01 92] In a further variation, auxotrophic mutations are generated by homologous recombination with a targeting vector 
5 comprising a selective marker flanked by regions of homology with a biosynthetic region of the genome of cells to be 
evolved. Recombination between the vector and the genome inserts the positive selection marker into the genome 
causing an auxotrophic mutation. The vector is in linear form before introduction of cells. Optionally, the frequency of 
introduction of the vector can be increased by capping its ends with self-complementarity oligonucleotides annealed in 
a hair pin formation. Genetic exchange and screening/selection proceed as described above. In each round, targeting 
10 vectors are reintroduced regenerating the same population of auxotrophic markers. 

[01 93] In another variation, fused cells are identified by screening for a genomic marker present on one subpopulation 
of parental cells and an epispmaf marker present on a second subpopulation of cells. For example, a first subpopulation 
of yeast containing mitochondria can be used to complement a second subpopulation of yeast having a petite phenotype 
(i.e., lacking mitochondria). 

15 [0194] In a further variation, genetic exchange is performed between two subpopulations of cells, one of which is dead. 
Cells are preferably killed by brief exposure to DNA fragmenting agents such as hydroxylamine, cupferon, or irradiation. 
Viable cells are then screened for a marker present on thie dead parental subpopulation. 

3. Liposome-mediated transfers 

20 

[0195] In the methods noted above, in which nucleic acid fragment libraries are introduced into protoplasts, the nucleic 
acids are sometimes encapsulated in liposomes to facilitate uptake by protoplasts. Lipsome-mediated uptake of DNA 
by protoplasts is described in Redford et al., MoL Gen. Genet. 184, 567-569 (1981). Liposomes can efficiently deliver 
large volumes of DNA to protoplasts (see Deshayes et al., EMBO J. 4. 2731-2737 (1985)). See also, Philippot and 

25 Schuber (eds) (1995) Liposomes as Tools in Basic Research and Industry CRC press, Boca Raton, e.g., Chapter 9, 
Remy etal. "Gene Transfer with Cationic Amphiphiles." Further, the DNA can be delivered as linear fragments, which 
are often more recombinogenic that whole genomes. In some methods, fragments are mutated prior to encapsulation 
in liposomes- In some methods, fragments are combined with RecA and homologs, or nucleases (e.g., restriction en- 
donucleases) before encapsulation in liposomes to promote recombination. Alternatively, protoplasts can be treated 

30 with lethal doses of nicking reagents and then fused. Cells which survive are those which are repaired by recombination 
with other genomic fragments, thereby providing a selection mechansim to select for recombinant (and therefore desirably 
diverse) protoplasts. 

4. Shuffling filamentous fungi 

35 

[0196] Filamentous fungi are particularly suited to performing the shuffling methods described above. Filamentous 
fungi are divided into four main classifications based on their structures for sexual reproduction: Phycomycetes, Asco- 
mycetes, Basidiomycetes and the Fungi Imperfecti. Phycomycetes (e.g., Rhizopus, Mucor) form sexual spores in spo- 
rangium. The spores can be uni or multinucleate and often lack septated hyphae (coenocytic). Ascomycetes (e.g., 

"fo Aspergillus, Neurospora, Peniciiium) produce sexual spores in an ascus as a result of meiotic division. Asci typically 
contain 4 meiotic products, but some contain 8 as a result of additional mitotic divjsion. Sas/d/omycetes include mush- 
rooms, and smuts and form sexual spores on the surface of a basidium. In holobasidiomycetes, such as mushl'ooms, 
the basidium is undivided. In hemihasidiomycetes, such as ruts ( Uredinales) and smut fungi (UstHaginaies), the basidium 
is divided. Fungi imperfecti, which include most human pathogens, have no known sexual stage. 

45 [0197] Fungi can reproduce by asexual, sexual or parasexual means. Asexual reproduction, involves vegetative growth 
of mycelia, nuclear division and cell division without involvement of gametes and without nuclear fusion. Cell division 
can occur by sporulation, budding or fragmentation of hyphae. 

[0198] Sexual reproduction provides a mechanism for shuffling genetic material between cells. A sexual reproductive 
cycle is characterized by ah alteration of a haploid phase and a diploid phase. Diptoidy occurs when two haploid gamete 

50 nuclei fuse (karyogamy). The gamete nuclei, can come from the same parental strains (self-fertile), such as in the 
homothallic fungi. In heterothallic fungi, the parental strains come from strains of different mating type. 
[0199] A diploid cell converts to haploidy via metosis, which essentially consists of two divisions of the nucleus ac- 
companied by one division of the chromosomes. The products of one meiosis are a tetrad (4 haploid nuclei). In some 
cases, a mitotic division occurs after meiosis, giving rise to eight product cells. The arrangement of the resultant cells 

55 (usually enclosed in spores) resembles that of the parental strains. The length of the haploid and diploid stages differs 
in various fungi: for example, the Basidiomycetes and many of the Ascomycetes have a mostly hapolid life cycle (that 
is, meiosis occurs immediately after karyogamy), whereas others (e.g., Saccharomyces cerevisiae) are diploid for most 
of their life cycle (karyogamy occurs soon after meiosis). Sexual reproduction can occur between cells in the same strain 
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(selfing) or between cells from different strains (outcrossing). 

[0200] Sexual dimorphism (dioecism) is the separate production of mate and female organs on different mycelia. This 
is a rare phenomenon among the fungi, although a few examples are known. Heterbthallism (one locus-two alleles) 
allows for outcrossing between crosscompatable strains which are self-incompatable. The simplest form is the two allele- 

5 one locus system of mating types/factors, illustrated by the following organisms: A and a in Neurospora; a and a in 
Saccharomyces; plus and minus in Schizzosaccharomyces and Zygomycetes: and 82 in UsWago. 
[0201] Multiple-allelomorph heterothallism is exhibited by some of the higher Basidiomycetes (e.g. Gasteromycetes 
and Hymenomycetes), which are heterothallic and have several mating types determined by multiple alleles. Heteroth- 
allism in these organisms is either bipolar with one mating type factor, or tetrapolar with two unlinked factors, A and S. 

10 Stable, fertile heterokaryon formation depends on the presence of different A factors and, in the case of tetrapolar 
organisms, of different S factors as well. This system is effective in the promotion of outbreeding and the prevention of 
self-breeding. The number of different mating factors may be very large (i.e. thousands) (Kothe, FEMS Microbiol. Rev. 
18, 65-87 (1996)), and non-parental mating factors may arise by recombination. 

[0202] Parasexual reproduction provides a further means for shuffling genetic material between cells. This process 
15 allows recombination of parental DNA without involvement of mating types or gametes. Parasexual fusion occurs by 
hyphal fusion giving rise to a common cytoplasm containing different nuclei. The two nuclei can divide independently in 
the resulting heterokaryon but occasionally fuse. Fusion is followed by haplotdization, which can involve loss of chro- 
mosomes and mitotic crossing over between homolgous chromosomes. Protoplast fusion is a form of parasexual re- 
production. 

20 [0203] Within the above four classes, fungi are also classified by vegetative compatibility group. Fungi within a veg- 
etative compatibility group can form heterokaryons with each other. Thus, for exchange of genetic material between 
different strains of fungi, the fungi are usually prepared from the same vegetative compatibility group. However, some 
genetic exchange can occur between fungi from different incompatibility groups as a result of parasexual reproduction 
(see Timberlake et al., US 5,605,820). Further, as discussed elsewhere, the natural vegetative compatibility group of 

25 fungi can be expanded as a result of shuffling. 

[0204] Several isolates of Aspergillus nidulans, A. flavus, A. fumigatus, Penicillium chrysogenum, P. notatum, Cepha- 
losporium chrysogenum, Neurospora crassa, Aureobasidium pultulans have been karyotyped. Genome sizes generally 
range between 20 and 50 Mb among the Aspergilli. Differences in karyotypes often exist between similar strains and 
are also caused by transformation with exogenous DNA. Filamentous fungal genes contain introns, usually -50-100 bp 

30 in size, with similar consensus 5' and 3' splice sequences. Promotion and termination signals are often cross-recognizable, 
enabling the expression of a gene/pathway from one fungus (e.g. A. nidulans) in another (e.g. P. chrysogenum). 
[0205] The major components of the fungal cell wall are chttin (or chitosan), p-glucan, and mannoproteins. Chitin and 
p-glucan form the scaffolding, mannoproteins are interstitial components which dictate the wall's porosity, antigenicity 
and adhesion. Chitin synthetase catalyzes the polymerization of p-(1 ,4)-linked N-acetytglucosamine (GlcNAc) residues, 

35 forming linear strands running antiparallel; p-(1 ,3)-glucan synthetase catalyze the homopolymerization of glucose. 

[0206] One general goal of shuffling is to evolve fungi to become useful hosts for genetic engineering, in particular for 
the shuffling of unrelated genes. A. nidulans and neurospora are generally the fungal organisms of choice to serve as 
a hosts for such manipulations because of their sexual cycles and well-established use in classical and molecular 
genetics. Another general goal is to improve the capacity of fungi to make specific compounds (e.g. antibacterials 

40 (penicillins, cephalosporins), antifungals (e.g. echinocandins, aureobasidins), and wood-degrading enzymes). There is 
some overlap between these general goals, and thus, some desired properties are useful for achieving both goals. 
[0207] One desired property is the introduction of meiotic apparatus into fungi presently lacking a sexual cycle (see 
Sharon el al., Mol. Gen. Genet. 251 , 60-68 (1 996)). A scheme for introducing a sexual cycle into the fungi P. chrysogenum 
(a fungus imperfecti) is shown in Fig. 6. Subpopulations of protoplasts are formed from A. nidulans (which has a sexual 

45 cycle) and P. chrysogenum, which does not. The two strains preferably bear different markers. The A. nidulans protoplasts 
are killed by treatment with UV or hydroxylamine. The two subpopulations are fused to form heterokaryons. In some 
heterokaryons, nuclei fuse, and some recombination occurs. Fused cells are cultured under conditions to generate new 
cell walls and then to allow sexual recombination to occur. Cells with recombinant genomes are then selected (e.g., by 
selecting for complementation of auxotrophic markers present on the respective parent strains). Cells with hybrid ge- 

50 nomes are hiore likely to have acquired the genes necessary for a sexual cycle. Protoplasts of cells can then be crossed 
with killed protoplasts of a further population of celts known to have a sexual cycle (the same or different as the previous 
round) in the same manner, followed by selection for cells with hybrid genomes. 

[0208] Another desired property is the production of a mutator strain of fungi. Such a fungus can be produced by 
shuffling a fungal strain containing a marker gene with one or more mutations that impair or prevent expression of a 
55 functional product. Shufflants are propagated under conditions that select for expression of the positive marker (while 
allowing a small amount of residual growth without expression). Shufflants growing fastest are selected to form the 
starting materials for the next round of shuffling. 

[0209] Another desired property is to expand the host range of a fungus so it can form heterokaryons with fungi from 
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other vegetative compatibility groups. Incompatability between species results from the interactions of specific alleles 
at different incompatability loci (such as the "het" \oc\). If two strains undergo hyphal anastomosis, a lethal cytoplasmic 
incompatability reaction may occur if the strains differ at these loci. Strains must carry identical loci to be entirely com- 
patible. Several of these loci have been identified in various species, and the incompatibility effect is somewhat additive 

5 (hence, "partial incompatibility" can occur). Some tolerant and /?ef-negative mutants have been described for these 
organisms (e.g. Dates & Croft, J. Gen. Microbiol 136, 1717-1724 (1990)). Further, a tolerance gene (tol) has been 
reported, which suppresses mating-type heterbkaryon incompatibility. Shuffling is performed between protoplasts of 
strains from different incompatibility groups. A preferred format uses a live acceptor strain and a UV-irradiated dead 
acceptor strain. The UV irradiation serves to introduce mutations into DNA inactivating het genes. The two strains should 

^0 bear different genetic markers. Protoplasts of the strain are fused, cells are regenerated and screened for complemen- 
tation of markers. Subsequent rounds of shuffling and selection can be performed in the same manner by fusing the 
cells surviving screening with protopliasts of a fresh population of donor cells. Similar to other procedures noted herein, 
the cells resulting from regeneration of the protoplasts are optionally refused by protoplasting and regenerated into cells 
one or more times prior to any selection step to increase the diversity of the resulting population of cells to be screened. 

75 [0210] Another desired property is the introduction of multiplie-allelomorph heterothallism into Ascomycetes and Fungi 
imperfecti, which do not normally exhibit this property. This mating system allows outbreeding without self-breeding. 
Such a mating system can be introduced l3y shuffling Ascomycetes and Fang/ //77perfec//with DNA from Gasteromycetes 
or Hymenomycetes, which have such a system. 

[021 1] Another desired property is spontaneous formation of protoplasts to facilitate use of a fungal strain as a shuffling 

20 host. Here, the fungus to be evolved is typically mutagenized. Spores of the fungus to be evolved are briefly treated with 
a cell-wall degrading agent for a time insufficient for complete protoplast formation, and are mixed with protoplasts from 
other strain{s) of fungi. Protoplasts formed by fusion of the two different subpopulations are identified by genetic or other 
selection/or screening as described above. These protoplasts are used to regenerate mycelia and then spores, which 
form the starting material for the next round of shuffling. In the next round, at least some of the surviving spores are 

25 treated with cell-wall removing enzyme but for a shorter time than the previous round. After treatment, the partially 
stripped cells are labeled with a first label. These cells are then mixed with protoplasts, which may derive from other 
cells surviving selection in a previous round, or from a fresh strain of fungi. These protoplasts are physically labeled with 
a second label. After incubating the cells under conditions for protoplast fusion fusants with both labels are selected. 
These fusants are used to generate mycelia and spores for the next round of shuffling, and so forth. Eventually, progeny 

30 that spontaneously form protoplasts (i.e., without addition of cell wall degrading agent) are identified. As with other 
procedures noted herein, cells or protoplasts can be reiteratively fused and regnerated prior to performing any selection 
step to increase the diversity of the resulting cells or protoplasts to be screened. Similarly, selected cells or protoplasts 
can be reiteratively fused and regenerated for one or several cycles without imposing selection on the resulting cellular 
or protoplast populations, thereby increasing the diversity of cells or protoplasts which are eventually screened. This 

35 process of performing multiple cycles, of recombination interspersed with selection steps can be reiteratively repeated 
as desired. 

[0212] Another desired property is the acquisition and/or improvement of genes encoding enzymes in biosynthetic 
pathways, genes encoding transporter proteins, and genes encoding proteins involved in metabolic flux control. In this 
situation, genes of the pathway can be introduced into the fungus to be evolved either by genetic exchange with another 
40 strain of fungus possessing the pathway or by introduction of a fragment library from an organism possessing the 
pathway. Genetic material of these fungi can then be subjected to further shuffling and screening/selection by the various 
procedures discussed in this application. Shufflant strainis of fungi are selected/screened for production of the compound 
produced by the metabolic pathway or precursors thereof. 

[0213] Another desired property is increasing the stability of fungi to extreme conditions such as heat. In this situation, 
45 genes conferring stability can be acquired by exchanging DNA with or transforming DNA from a strain that already has 
such properties. Alternatively, the strain to be evolved can be subjected to random mutagenesis. Genetic material of 
the fungus to be evolved can be shuffled by any of the procedures described in this application, with shufflants being 
selected by surviving exposure to extreme conditions. 

[0214] Another desired property is capacity of a fungus to grow under altered nutritional requirements (e.g., growth 
50 on particular carbon or nitrogen sources). Altering nutritional requirements is particularly valuable, e.g., for natural isolates 
of fungi that produce valuable commercial products but have esoteric and therefore expensive nutritional requirement. 
The strain to be evolved undergoes genetic exchange and/or transformation with DNA from a strain that has the desired 
nutritional requirements. The fungus to be evolved can then optionally be subjected to further shuffling as described in 
this application and with recombinant strains being selected for capacity to grow in the desired nutritional circumstances. 
55 Optionally, the nutritional circumstances can be varied in successive rounds of shuffling starting at close to the natural 
requirements of the fungus to be evolved and in subsequent rounds approaching the desired nutritional requirements. 
[0215] Another desired property is acquisition of natural competence in a fungus. The procedure for acquisition of 
natural competence by shuffling is generally described in PCT/US97/04494. The fungus to be evolved typically undergoes 
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genetic exchange or transformation with DNA from a bacterial strain or fungal strain that already has this property. Cells 
with recombinant genomes are then selected by capacity to take up a plasmid bearing a selective marker. Further rounds 
of recombination and selection can be performed using any of the procedures described above. 
[021 6] Another desired property is reduced or increased secretion of proteases and DNase. In this situation, the fungus 

5 to be evolved can acquire DNA by exchange or transformation from another strain known to have the desired property. 
Alternatively, the fungijs to be evolved can be subject to random mutagenesis. The fungus to be evolved is shuffled as 
above. The presence of such enzymes, or lack thereof, can be assayed by contacting the culture media from individual 
Isolates with a fluorescent molecule tethered to a support yia a peptide or DNA linkage. Cleavage of the linkage releases 
detectable fluorescence to the media. 

10 [0217] Another desired property is producing fungi with altered transporters (e.g., MDR). Such altered transporters 
are useful, for example, in fungi that have been evolved to produce new secondary metabolites, to allow entry of precursors 
required for synthesis of the new secondary metabolites into a cell, or to allow efflux of the secondary metabolite from 
the cell. Transporters can be evolved by introduction of a library of transporter variants into fungal cells and allowing the 
cells to recomblne by sexual or parasexual recombination. To evolve a transporter with capacity to transport a precursor 

15 into the cells, cells are propagated in the present of precursor, and celjs are then screened for production of metabolite. 
To evolve a transporter with capacity to export a metabolite, cells are propagated under conditions supporting production 
of the metabolite , and screened for export of metabolite to culture medium. A general method of fungal shuffling is shown 
in Fig. 7. Spores from a frozen stock, a lyophilized stock, or fresh from an ^gar plate are used to inoculate suitable liquid 
medium (1 ). Spores are germinated resulting in hyphal growth (2). Mycelia are harvested, and washed by filtration and/or 

20 centrifugation. Optionally the sample is pretreated with DTT to enhance protoplast formation (3). Protoplasting is per- 
formed in an osmotically stabling medium (e.g., 1 m NaCI/20mM MgS04, pH 5.8) by the addition of cell wall-degrading 
enzyme (e.g., Novozyme 234) (4). Cell wall degrading enzyme is removed by repeated washing with osmotically stabilizing 
solution (5). Protoplasts can be separated from mycelia, debris and spores by filtration through miracloth, and density 
centrifugation (6). Protoplasts are harvested by centrifugation and resuspended to the appropriate concentration. This 

25 step may lead to some protoplast fusion (7). Fusion can be stimulated by addition of PEG (e.g., PEG 3350), and/or 
repeated centrifugation and resuspenston with or without PEG . Electrofusion can also be performed (8). Fused protoplasts 
can optionally be enriched from unfused protoplasts by sucrose gradient sedimentation (or other methods of screening 
described above). Fused protoplasts can optionally be treated with ultraviolet irradiation to stimulate recombination (9). 
Protoplasts are cultured on osmotically stabilized agar plates to regenerate cell walls and form mycelia (10). The mycelia 

30 are used to generate spores (11), which are used as the starting material in the next round of shuffling (12). 

[0218] Selection for a desired property can be performed either on regenerated mycelia or spores derived therefrom. 
[0219] In an alternative method, protoplasts are formed by inhibition of one or more enzymes required for cell wall 
synthesis (see Fig. 8). The inhibitor should be fungistatic rather than fungicidal under the conditions of use. Examples 
of inhibitors include antifungal compounds described by (e.g., Georgopapadakou & Walsh; Antimicrob, Ag. Chemother. 

35 40, 279-291 (1996); Lyman & Walsh. Drugs 44, 9-35 (1992)). Other examples include chitin synthase inhibitors (polyoxin 
or nikkomycin compounds) and/or glucan synthase inhibitors (e.g. echinocandins, papulocandins, pneumocandins). 
Inhibitors should be applied in osmotically stabilized medium. Cells stripped of their cell walls can be fused or otherwise 
employed as donors or hosts in genetic transformation/strain development programs. A possible scheme utilizing this 
method reiteratively is outlined in Figure 8. 

40 [0220] In a further variation, protoplasts are prepared using strains of fungi, which are genetically deficient or com- 
promised in their ability to synthesize intact cell walls (see Fig. 9). Such mutants are generally referred to as fragile, 
osmotic-remedial, or cell wall-less, and are obtainable from strain deipositories. Examples of such strains include Neu- 
rospora crassa o$ mutants (Selitrennikoff,7\/7f/m/c/*ob. Agents. Chemother, 23, 757-765 (1983)). Some such mutations 
are temperature-sensitive. Temperature-sensitive strains can be propagated at the permissive temperature for purposes 

45 of selection and amplification and at a nonpermissive temperature for purposes of protoplast formation and fusion. A 
temperature sensitive strain Neurospora crassa os strain has been described vvhich propagates as protoplasts when 
growth in osmotically stabilizing medium containing sorbose and polyoxin at nonpermissive temperature but generates 
whole cells on transfer to medium containing sorbitol at a permissive temperature. See US 4,873,196. 
[0221] Other suitable strains can be produced by targeted mutagenesis of genes involved in chitin synthesis, glucan 

50 synthesis and other cell wall-related processes. Examples of such genes include CHT1 , CHT2 and CALI (or CSD2) of 
Saccharomyces cerevisiae and Candida spp. (Georgopapadakou & Walsh 1996); ETGI/FKSIICNDIf CWH53/PB R\ 
and homologs in S. cerevisiae. Candida albicans. Cryptococcus neoformans, Aspergillus fumigatus, ChvAlNdvA Agro- 
bacterium and Rhizobium. Other examples are MA, orB, orlC, MD, tsE. and bimG of Aspergillus nidulans (Borgia, J. 
BacterioL 174, 377-389 (1992)). Strains of A. nidulans containing OrlA1 or tsel mutations lyse at restrictive temperatures 

55 Lysis of these strains may be prevented by osmotic stabilization, and the mutations may be complemented by the addition 
of N-acetylglucosimine (GlcNac). S/mG ?t mutations are ts for a type 1 protein phosphatase (germlines of strains carrying 
this mutation tack chitin, and condia swell and lyse). Other suitable genes are chsK-chsB, chsC, chsD and chsE of 
Aspergillus fumigatus: chs1 and chs2oi Neurospora crassa ; Phycomyces blakesleeanus MM and chsl, 2 and 3 of S. 
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cerevisiae. Chs1 is a non-essential repair enzyme; chs2 is involved in septum formation and chs3 is involved in cell wall 
maturation and bud ring formation. 

[0222] Other useful strains include S. cerevisiae CLY (cell lysis) mutants such as ts strains (Paravicini et aL, Mol. Cell 
Biol. 12, 4896-4905 (1992)), and the CLY 15 strain which harbors a PKC 1 gene deletion. Other useful strains include 

s strain VY 1160 containing a ts mutation in srb (encoding actin) (Schade et al. Acta Histochem. Suppi 41, 193-200 
(1991)), and a strain with an ses mutation which results in increased sensitivity to cell-wall digesting enzymes isolated 
from snail gut (Metha & Gregory, Appi. Environ. Microbiol. 41, 992-999 (1981)). Useful strains ofC. albicans include 
those with mutations In chs^, chsl, or c^s3 (encoding chitin synthetases), such as osmotic remedial conditional lethal 
mutants described by Payton & de Tiani, Curr. Genet 17, 293-296 (1990); C. uW/s mutants with increased sensitivity 

to to cell-wall digesting enzymes isolated from snail gut (Metha & Gregory. 1981, supra)\ and N. crassa mutants os-1, os- 
2, OS'3, 05-4, OS'S, amd os-S. See. Selitrennikoff, Antimicrob. Agents Chemother. 23, 757-765 (1983). Such mutants 
grow and divide without a cell wall at 37''C, but at 22'C produce a cell wall. 

[0223] Targeted mutagenesis can be achieved by transforming cells with a positive-negative selection vector containing 
homologous regions flanking a segment to be targeted, a positive selection marker between the homologous regions 

fS and a negative selection marker outside the homologous regions (see Capecchi, US 5.627,059). In a variation, the 
negative selection marker can be an antisense transcript of the positive selection marker (see US 5,527.674). 
[0224] Other suitable cells can be selected by random mutagenesis or shuffling procedures in combination with se- 
lection. For example, a first subpopulation of cells are mutagenized, allowed to recover from mutagenesis, subjected to 
incomplete degradation of cell walls and then contacted with protoplasts of a second subpopulation of cells. Hybrids 

20 cells bearing markers from both subpopulations are identified (as described above) and used as the starting materials 
in a subsequent round of shuffling. This selection scheme selects both for cells with capacity for spontaneous protoplast 
formation and for cells with enhanced recombinogenicity. 

[0225] In a further variation, cells having capacity for spontaneous protoplast formation can be crossed with cells 
having enhanced recombinogenicity evolved using other methods of the invention. The hybrid cells are particularly 

25 suitable hosts for whole genome shuffling. 

[0226] Cells with mutations in enzymes involved in cell wall synthesis or maintenance can undergo fusion simply as 
a result of propagating the cells in osmotic-protected culture due to spontaneous protoplast formation. If the mutation is 
conditional, cells are shifted to a nonpermissive condition. Protoplast formation and fusion can be accelerated by addition 
of promoting agents, such as PEG or an electric field (See Philipova & Venkov, Yeast S, 205-212 (1990); Tsoneva et 

30 al, FEMS Microbiol. Lett. 51, 61-65 (1989)). 

5. Targeted Shuffling — Hot Spots 

[0227] In one aspect, targeted homologous genes are cloned into specific regions of the genome (e.g., by homologous 
35 recombination or other targeting procedures) which are known to be recombination "hot spots" (i.e., regions showing 
elevated levels of recombination compared to the average level of recombination observed across an entire genome), 
or known to be proximal to such hot spots. The resulting recombinant strains are mated recursively. During meiotic 
recombination, homologous recombinant genes recombine, thereby increasing the diversity of the genes. After several 
cycles of recombination by recursive mating, the resulting cells are screened. 

40 

6. Shuffling Methods in Yeast 

[0228] Yeasts are subspecies of fungi that grow as single cells. Yeasts are used for the production of fermented 
beverages and leavening, for production of ethanol as a fuel, low molecular weight compounds, and for the heterologous 

45 production of proteins and enzymes (see accompanying list of yeast strains and their uses). Commonly used strains of 
yeast include Saccharomyces cerevisiae, Pichia $p., Canidia sp, and Schizosaccharomyces pombe. 
[0229] Several types of vectors are available for cloning in yeast including integrative plasmid (Yip), yeast replicating 
plasmid (YRp, such as the 2\i circle based vectors), yeast episomal plasmid (YEp), yeast centromeric plasmid (YCp), 
or yeast artificial chromosome (YAC). Each vector can carry markers useful to select for the presence of the plasmid 

50 such as LUE2, URA3, and HI S3, or the absence of the plasmid such as URA3 (a gene that is toxic to cells grown in the 
presence of 5-fluoro orotic acid. 

[0230] Many yeasts have a sexual cycle and asexual (vegetative) cycles. The sexual cycle involves the recombination 
of the whole genome of the organism each time the cell passes through meiosis. For example, when diploid cells of S. 
cerevisiae are exposed to nitrogen and carbon limiting conditions, diploid cells undergo meiosis to form asci. Each ascus 
55 holds four haploid spores, two of mating type "a" and two of mating type "a." Upon return to rich medium, haploid spores 
of opposite mating type mate to form diploid cells once again. Asiospores of opposite rhating type can mate within the 
ascus, or if the ascus is degraded, for example with zymolase, the haploid cells are liberated and can mate with spores 
from other asci. This sexual cycle provides a format to shuffle endogenous genomes of yeast and/or exogenous fragment 
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libraries inserted into yeast vectors. This process results in swapping or accumulation of hybrid genes, and for the 
shuffling of homologous sequences shared by mating cells. 

[0231] Yeast strains having mutations in several known genes have properties useful for shuffling. These properties 
include increasing the frequency of recombination and increasing the frequency of spontaneous mutations within a cell. 

5 These properties can be the result of mutation of a coding sequence or altered expression (usually overexpresston) of 
a wildtype coding sequence. The HO nuclease effects the transposition of HMLa/a and HMRa/a to the MAT locus 
resulting in mating type switching. Mutants in the gene encoding this enzyme do not switch their mating type and can 
be employed to force crossing between strains of defined genotype, such as ones that harbor a library or have a desired 
phenotype and to prevent in breeding of starter strains. PMSI, MLH1, MSH2, MSH6 are involved in mismatch repair. 

10 Mutations in these geries all have a mutator phenotype (Chambers et al.. /Wo/. Cell. Biol. 16,611 0-61 20 (1996)). Mutations 
in T0P3 DNA topoisomerase have a 6-fold enhancement of interchromosomal homologous recombination (Bailis et al., 
Molecular and Cellular Biology 12, 4988-4993 (1992)). The RAD50-57 genes confer resistance to radiation. Rad3 
functions in excision of pyrimidinedimers. RAD52 functions in gene conversion. RAD50, .MRE11,XRS2 function in both 
homologous recombination and illegitimate recomb)ination. H0P1, RED 1 function in early meiotic recombination (Mao- 

15 Draayer, Genetics 144, 71-86) Mutations in either H0P1 orREDI reduce double stranded breaks at the HIS2 recom- 
bination hotspot. Strains deficient in these genes are useful for maintaining stability in hyper recombinogenic constructs 
such as tandem expression libraries carried on YACs. Mutations in HPR 1 are hyperrecombinogenlc. HDF1 has DNA 
end binding activity and is involved in double stranded break repair and V(D)J recombination. Strains bearing this mutation 
are useful for transformation with random genomic fragments by either protoplast fusion or electroporation. Kar-1 is a 

20 dominant mutation that prevents karyogamy. Kar-1 mutants are useful for the directed transfer of single chromosomes 
from a donor to a recipient strain. This technique has been widely used in the transfer of YACs between strains, and is 
also useful in the transfer of evolved genes/chromosomes to other organisms (Markie, YAC Protocols, (Humana Press, 
Totowa, NJ, 1996). H0T1 is an S. cerevisiae recombination hotspot within the promoter and enhancer region of the 
rDNA repeat sequences. This locus induces mitotic recombination at adjacent sequences-presumably due to its high 

25 level transcription. Genes and/or pathways inserted under the transcriptional control of this region undergo increased 
mitotic recombination. The regions surrounding the arg 4 and his 4 genes are also recombination hot spots, and genes 
cloned in these regions have an increased probability of undergoing recombination during meiosis. Homologous genes 
can be cloned in these regions and shuffled in vivo by recursively mating the recombinant strains. CDC2 encodes 
polymerase 5 and is necessary for mitotic gene conversion. Overexpression of this gene can be used in a shuffler or 

30 mutator strain, A temperature sensitive mutation in CDC4 halts the cell cycle at G1 at the restrictive temperature and 
could be used to synchronize protoplasts for optimized fusion and subsequent recombination. 
[0232] As with filamentous fungi, the general goals of shuffling yeast include improvement in yeast as a host organism 
for genetic manipulation, and as a production apparatus for various compounds. One desired property in either case is 
to improve the capacity of yeast to express and secrete a heterologous protein. The following example describes the 

35 use of shuffling to evolve yeast to express and secrete increased amounts of RNase A. 

[0233] RNase A catalyzes the cleavage of the P-O5. bond of RNA specifically after pyrimidine nucleotides. The enzyme 
is a basic 124 amino acid polypeptide that has 8 half cystine residues, each required for catalysis. YEpWL-RNase A is 
a vector that effects the expression and secretion of RNaseA from the yeast S. cerevisiae, and yeast harboring this 
vector secrete 1-2 mg of; recombinant RNase A per liter of culture medium (del Cardayr§ et al., Protein Engineerings 

"fo (3):26, 1-273 (1995)). This overall yield is poor for a protein heterologously expressed in yeast and can be improved at 
least 10-100 fold by shuffling. The expression of RNaseA is easily detected by several plate and microtitre plate assays 
(del Cardayr6 & Raines, Biochemistry 6031-6037 1994)). Each of the described formats for whole genome shuffling 
can be used to shuffle a strain of S. cere ws/ae harboring YEpWL RNase A, and the resulting cells can be screened for 
the increased secretion of RNase A into the medium. The new strains are cycled recursively through the shuffling format, 

^5 until sufficiently high levels of RNase A secretion is observed. The use of RNase A is particularly useful since it not only 
requires proper folding and disulfide bond formation but also proper glycosytalion. Thus numerous components of the 
expression, folding, and secretion systems can be optimized. The resulting strain is also evolved for improved secretion 
of other heterologous proteins. 

[0234] Another goal of shuffling yeast is to increase the tolerance of yeast to ethanol. Such is useful both for the 
50 commercial production of ethanol, and for the production of more alcoholic beers and wines. The yeast strain to be 
shuffled acquires genetic material by exchange or transformation with other strain(s) of yeast, which may or may not be 
know to have superior resistance to ethanol. The strain to be evolved is shuffled and shufflants are selected for capacity 
to survive exposure to ethanol. Increasing concentrations of ethanol can be used in successive rounds of shuffling. The 
same principles can be used to shuffle baking yeasts for improved osmotolerance. 
55 [0235] Another desired property of shuffling yeast is capacity to grow under desired nutritional conditions. For example, 
it is useful to yeast to grow on cheap carbon sources such as methanol, starch, molases, cellulose, cellobiose, or xylose 
depending on availability. The principles of shuffling and selection are similar to those discussed for filamentous fungi. 
[0236] Another desired property is capacity to produce secondary metabolites naturally produced by filamentous fungi 
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or bacteria, Examples of such secondary metabolites are cyclosporin A, taxol, and cephalosporins. The yeast to be 
evolved undergoes genetic exchange or is transformed with DNA from organism{s) that produce the secondary metab- 
olite. For example, fungi producing taxol include Taxomyces andreanae and Pestalotopis microspora (Stierie et aL, 
Sc/ence 260, 214-216 (1993); Strobet et al., Microbiol, 142, 435-440 (1996)). DNA can also be obtained from trees that 

5 naturally produce taxol, such as Taxus brevifolia. DNA encoding one enzyme in the taxot pathway, taxadiene synthase, 
which it is believed catalyzes the committed step in taxol biosynthesis and may be rate limiting in overall taxol production, 
has been cloned (Wildung & Croteau, J. Biol, Chem. 271, 9201-4 (1996). The DNA is then shuffled, and shufflants are 
screened/selected for production of the secondary metabolite. For example, taxol production can be monitored using 
antibodies to taxol, by mass spectroscopy or UV spectrophotometry. Alternatively, production of intermediates in taxol 

^0 synthesis or enzymes in the taxol synthetic pathway can be monitored. Concetti & Ripani, 6/o/. Chem. Hoppe Seyter 
375, 419-23 (1994). Other examples of secondary metabolites are polyots, amino acids, polyketides, non-ribosomal 
polypeptides, ergosterot. carotenoids, terpinoids, sterols, vitamin E, and the like. 

[0237] Another desired property is to increase the flocculence of yeast to facilitate separation in preparation of ethanol. 
Yeast can be shuffled by any of the procedures noted above with selection for shuffled yeast forming the largest clumps. 

15 

7. Exemplary procedure for yeast protoplasting 

[0238] Protoplast preparation in yeast is reviewed by Morgan, in Protoplasts (Birkhauser Verlag, Basel, 1983). Fresh 
cells (-10^) are washed with buffer, for example 0.1 M potassium phosphate, then resuspended in this same buffer 

20 ' containing a reducing agent, such as 50 mM DTT, incubated for 1 h at SO^'C with gentle agitation, and then washed 
again with buffer to remove the reducing agent. These cells are then resuspended in buffer containing a cell wall degrading 
enzyme, such as Novozyme 234 (1 mg/mL), and any of a variety of osmotic stabilizers, such as sucrose, sorbitol, NaCI, 
KCI, MgS04, MgCl2, or NH^CI at any of a variety of concentrations. These suspensions are then incubated at 30"C with 
gentle shaking (*60 rpm) until protoplasts are released. To generate protoplasts that are more likely to produce productive 

25 fusants several strategies are possible. 

[0239] Protoplast formation can be increased if the cell cycle of the protoplasts have been synchronized to be halted 
at G1. In the case of S. cerevisiae this can be accomplished by the addition of mating factors, either a or a (Curran & 
Carter, J. Gen. Microbiol, 1 29, 1 589-1 591 (1 983)). These peptides act as adenylate cyclase inhibitors which by decreasing 
the cellular level of cAMP arrest the cell cycle at G1 . In addition, sex factors have been shown to induce the weakening 

30 of the cell wall in preparation for the sexual fusion of a and a cells (Crandall & Brock, Bacteriol. Rev. 32, 139-163 (1968); 
Osumi et al., Arch, Microbiol, 97, 27-38 (1974)). Thus in the preparation of protoplasts, cells can be treated with mating 
factors or other known inhibitors of adenylate cyclase, such as leflunomide or the killer toxin from K. lactis, to arrest them 
at G1 (Sugisaki et al.. Nature 304, 464-466 (1983)). Then after fusing of the protoplasts (step 2), cAMP can be added 
to the regeneration medium to induce S-phase and DNA synthesis. Alternatively, yeast strains having a temperature 

35 sensitive mutation in the CDC4 gene can be used, such that cells could be synchronized and arrested at 01 . After fusion 
cells are returned to the permissive temperature so that DNA synthesis and growth resumes. 

[0240] Once suitable protoplasts have been prepared, it is necessary to induce fusion by physical or chemical means. 
An equal number of protoplasts of each cell type is mixed in phosphate buffer (0.2 M, pH 5.8, 2 x 10^ cells/mL) containing 
an osmotic stabilizer, for example 0.8 M NaCI, and PEG 6000 (33% w/v) and then incubated at 3Qf|B5i^,&'min-while- 
"fo fusion occurs. Polyots, or other compounds that bind water, can be employed. The fusants are then washed and resus- 
pended in the osmotically stabilized buffer lacking PEG, and transferred to osmotically stabilized regeneration medium 
on/in which the cells can be selected or screened for a desired property. 

8. Shuffling iVIethods Using Artificial Chromosomes 

45 

[0241] Yeast artificial chromosomes (Yacs) are yeast vectors into which very large DNA fragments (e.g., 50-2000 kb) 
can be cloned (see, e.g., Monaco & Larin, Trends. Biotech. 12(7), 280-286 (1994); Ramsay, Mot, BlotechnoL 1(2), 
181-201 1994; Huxley. Genet. Eng. 16, 65-91 (1994); Jakobovits, Curr. Biol. A(B), 761-3 (1994); Lamb & Gearhart, Curr. 
Opin. Genet Dev. 5(3), 342-8 (1995); Montotiu et al., Reprod. Fertil. Dev. 6, 577^84 (1994)). These vectors have 
50 telomeres (Tel), a centromere (Cen), an autonomously replicating sequence (ARS), and can have genes for positive 
(e.g., TRP1) and negative (e.g., URA3) selection. YACs are maintained, replicated, and segregate as other yeast chro- 
mosomes through both meiosis and mitosis thereby providing a means to expose cloned DNA to true meiotic recombi- 
nation. 

[0242] YACs provide a vehicle for the shuffling of libraries of large DNA fragments in vivo. The substrates for shuffling 
55 are typically large fragments from 20 kb to 2 Mb. The fragments can be random fragments or can be fragments known 
to encode a desirable property. For example, a fragment might include an operon of genes involved in production of 
antibiotics. Libraries can also include whole genomes or chromosomes. Viral genomes and some bacterial genomes 
can be cloned intact into a single YAC. In some libraries, fragments are obtained from a single organism. Other libraries 
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include fragment variants, as where some libraries are obtained from different individuals or species. Fragment variants 
can also be generated by induced mutation. Typically, genes within fragments are expressed from naturally associated 
regulatory sequences within yeast. However, alternatively, individual genes can be linked to yeast regulatory elements 
to form an expression cassette, and a concatemer of such cassettes, each containing a different gene, can be inserted 
5 into a YAC. 

[0243] In some instances, fragments are incorporated into the yeast genome, and shuffling is used to evolve improved 
yeast strains. In other instances, fragments remain as components of YACs throughout the shuffling process, and after 
acquisition of a desired property, the YACs are transferred to a desired recipient cell. 

10 9. Methods of Evolving Yeast Strains 

[0244] Fragments are cloned into a YAC vector, and the resulting YAC library is transformed into competent yeast 
cells. Transformants containing a YAC are identified by selecting for a positive selection marker present on the YAC. 
The cells are allowed to recover and are then pooled. Thereafter, the cells are induced to sporulate by transferring the 

15 cells from rich medium, to nitrogen and carbon limiting medium. In the course of spprulation, cells undergo meiosis. 
Spores are then induced to mate by return to rich media; Optionally, asci are lysed o liberate spores, so that the spores 
can mate with other spores originating from other asci. Mating results in recombination between YACs bearing different 
inserts, and between YACs and natural yeast chromosomes. . The latter can be promoted by irradiating spores with ultra 
violet light. Recombination can give rise to new phenotypes either as a result of genes expressed by fragments on the 

20 YACs or as a result of recombination with host genes, or both. 

[0245] After induction of recombination between YACs and natural yeast chromosomes, YACs are often eliminated 
by selecting against a negative selection marker on the YACs. For example, YACs containing the marker URA3 can be 
selected against by propagation on media containing 5-fluro-orotic acid. Any exogenous or altered genetic material that 
remains is contained within natural yeast chromosomes. Optionally, further rounds of recombination between natural 

25 yeast chromosomes can be performed after elimination of YACs. Optionally, the same or different library of YACs can 
be transformed into the cells, and the above steps repeated. By recursively repeating this process, the diversity of the 
population is increased prior to screening. 

[0246] After elimination of YACs, yeast are then screened or selected for a desired property. The property can be a 
new property conferred by transferred fragments, such as production of an antibiotic. The property can also be an 
30 improved property of the yeast such as improved capacity to express or secrete an exogenous protein, improved re- 
combinogenictty, improved stability to temperature or solvents, or other property required of commercial or research 
strains of yeast. 

[0247] Yeast strains surviving selection/screening are then subject to a further round of recombination. Recombination 
can be exclusively between the chromosomes of yeast surviving selection/screening. Alternatively, a library of fragments 

35 can be introduced Into the yeast cells and recombined with endogenous yeast chromosomes as before. This library of 
fragments can be the same or different from the library used in the previous round of transformation. For example, the 
YACs could contain a library of genomic DNA isolated from a pool of the improved strains obtained in the eariier steps. 
YACs are eliminated as before, followed by additional rounds of recombination and/or transformation with further YAC 
libraries. Recombination is followed by another round of selection/screening, as above. Further rounds of recombination/ 

40 screening can be performed as needed until a yeast strain has evolved to acquire the desired property. 

[0248] An exemplary scheme for evolving yeast by introduction of a YAC library is shown in Fig. 10. The first part of 
the figure shows yeast containing an endogenous diploid genome and a YAC library of fragments representing variants 
of a sequence. The library is transformed into the cells to yield 100-1000 colonies per jxg DNA. Most transformed yeast 
cells now harbor a single YAC as well as endogenous chromosomes. Meiosis is induced by growth on nitrogen and 

45 carbon limiting medium. In the course of meiosis the YACs recombine with other chromosomes in the same cell. Haploid 
spores resulting from meiosis mate and regenerated diploid forms. The diploid forms now harbor recombinant chromo- 
somes, parts of which come from endogenous chromosomes and parts from YACs. Optionally, the YACs can now be 
cured from the cells by selecting against a negative selection marker present on the YACS. Irrespective whether YACS 
are selected against, cells are then screened or selected for a desired property. Cells surviving selection/screening are 

50 transfomied with another YAC library to start another shuffling cycle. 

10. Method of Evolving YACs for Transfer to Recipient Strain 

[0249] These methods are based in part on the fact that multiple YACs can be harbored in the same yeast cell, and 
55 YAC- YAC recombination is known to occur (Green & Olson. Science 250, 94-98 1990)). Inter- YAC recombination 
provides a format for which families of homologous genes harbored on fragments of>20 kb can be shuffled in vivo. The 
starting population of DNA fragments show sequence similarity with each other but differ as a result of for example, 
induced, allelic or species diversity. Often DNA fragments are known or suspected to encode multiple genes that function 
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in a common pathway. 

[0250] The fragments are cloned into a Yac and transformed into yeast, typically with positive selection for transform- 
ants. The transformants are induced to sporulate, as a result of which chromosomes undergo meiosis. The cells are 
then mated. Most of the resulting diploid cells now carry two YACs each having a different insert. These are again 

5 induced to sporulate and mated. The resulting cells harbor YACs of recombined sequence. The cells can then be 
screened or selected for a desired property. Typically, such selection occurs in the yeast strain used for shuffling. 
However, if fragments being shuffled are not expressed in yeast, YACs can be isolated and transferred to an appropriate 
cell type in which they are expressed for screening. Examples of such properties include the synthesis or degradation 
of a desired compound, increased secretion of a desired gene product, or other detectable phenotype. 

10 [0251] Preferably, the YAC library is transformed into haploid a and haploid a cells. These cells are then induced to 
mate with each other, i.e.. they are pooled and induced to mate by growth on rich medium. The diploid cells, each 
carrying two YACs, are then transferred to sporulation medium. During sporulation, the cells undergo meiosis, and 
homologous chromosomes recombine. In this case, the genes harbored in the YACs will reconribine, diversifying their 
sequences. The resulting haploid acospores are then liberated from the asci by enzymatic degradation of the asci wall 

15 or other available means and the pooled liberated haploid acospores are induced. to mate by transfer to rich medium. 
This process is repeated for several cycles to increase the diversity of the DMA cloned into the YACs. The resulting 
population of yeast cells, preferably in the haploid state, are either screened for improved properties, or the diversified 
DNA is delivered to another host cell or organism for screening. 

[0252] Cells surviving selection/screening are subjected to successive cycles of pooling, sporulation, mating and 
20 selection/screening until the desired phenotype has been observed. Recombination can be achieved simply by trans- 
ferring cells from rich medium to carbon and nitrogen limited medium to induce sporulation, and then returning the spores 
to rich media to induce mating. Asci can be lysed to stimulate mating of spores originating from different asci. 
[0253] After YACs have been evolved to encode a desired property they can be transferred to other cell types. Transfer 
can be by protoplast fusion, or retransform alien with isolated DNA. For example, transfer of YACs from yeast to mam- 
25 malian cells is discussed by Monaco & Larin, Trends in Biotechnology 12. 280-286 (1994); Montoliu et al., Reprod. 
Fertil. Dev. 6. 577-84 (1994); Lamb et al.. Curr. Opin, Genet Dev. S, 342-8 (1995). 

[0254] An exemplary scheme for shuffling a YAC fragment library in yeast is shown in Fig. 11. A library of YAC 
fragments representing genetic variants are transformed into yeast that have diploid endogenous chromosomes. The 
transformed yeast continue to have diploid endogenous chromosomes, plus a single YAC. The yeast are induced to 

30 undergo meiosis and sporulate. The spores contain haploid genomes and. are selected for those which contain a YAC, 
using the YAC selective marker. The spores are induced to mate generating diploid cells. The diploid cells now contain 
two YACs bearing different inserts as well as diploid endogenous chromosomes. The cells are again induced to undergo 
meiosis and sporulate during meiosis, recombination occurs between the YAC inserts, and recombinant YACs are 
segregated to ascoytes. Some ascoytes thus contain haploid endogenous chromosomes plus a YAC chromosome with 

35 a recombinant insert. The ascoytes mature to spores, which can mate again generating diploid cells. Some diploid cells 
now possess a diploid complemient of endogenous chromosomes plus two recombinant YACs. These cells can then be 
taken through further cycles of meiosis, sporulation and mating. In each cycle, further recombination occurs between 
YAC inserts and further recombinant forms of inserts are generated. After one or several cycles of recombination has 
occurred, cells can be tested for acquisition of a desired property. Further cycles of recombination, followed by selection. 

^0 can then be performed in similar fashion. 

11. /n wVo Shuffling of Genes by the Recursive Mating of Yeast Cells Harboring Homologous Genes in Identical Loci. 

[0255] A goal of DNA shuffling is to mimic and expand the combinatorial capabilities of sexual recombination. In vitro 
45 DNA shuffling succeeds in this process. However, by changing the mechanism of recombination and altering the con- 
ditions under which recombination occurs, naturally in vitro recombination methods riiay jeopardize intrinsic infomriation 
in a DNA sequisnce that renders it "evolvable." 

[0256] Shuffling in vivo by employing the natural crossing over mechanisms that occur during meiosis may access 
inherent natural sequence information and provide a means bf creating higher quality shuffled libraries. Described here 
50 is a method for the in vivo shuffling of DNA that utilizes the natural mechanisms of meiotic recombination and provides 
an alternative method for DNA shuffling. 

[0257] The basic strategy is to clone genes to be shuffled into identical loci within the haploid genome of yeast. The 
haploid cells are then recursively induced to mate and to sporulate. The process subjects the cloned genes to recursive 
recombination during recursive cycles of meiosis: The resulting shuffled genes are then screened in in situ or isolated 
55 and screened under different conditions. 

[0258] For example, if one wished to shuffle a family of five lipase genes, the following provides a means of doing so 
in vivo, 

[0259] The open reading frame of each lipase is amplified by the PCR such that each ORF is flanked by identical 3' 
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and 5' sequences. The 5' flanking sequence is identical to a region within the 5' coding sequence of the S. cerevisiae 
ura 3 gene and the 3' flanking sequence Is identical to a region within the 3' of the ura 3 gene. The flanking sequences 
are chosen such that homologous recombination of the PGR product with the ura 3 gene results in the incorporation of 
the lipase gene and the disruption of the ura 3 ORF. Both S cerevisiae a and a haploid cells are then transformed with 

5 each of the PGR amplified lipase ORFs, and cells having incorporated a lipase gene into the ura 3 locus are selected 
by growth on 5 fluoro orotic acid (SFOA is lethal to cells expressing functional URA3). The result is 10 cell types, two 
different mating types each harboring one of the five lipase genes in the disrupted ura 3 locus. These cells are then 
pooled and grown under conditions where mating between the a and a cells are favored, e:g. in rich medium. 
[0260] Mating results in a combinatorial mixture of diploid cells having all 32 possible combinations of lipase genes 

TO in the two ura 3 loci. The cells are then induced to sporulate by growth under carbon and nitrogen limited conditions. 
During sporulation the diploid cells undergo .meiosis to form four (two a and two a) haploid ascospores housed in an 
ascus. During meiosis It of the sporulation process sister chromatids align and crossover. The lipase genes cloned into 
the uraZ loci will also align and recombine. Thus the resulting haploid ascospores will represent a library of cells each 
harboring a different possible chimeric lipase gene, each a unique result of the meiotic recombination of the two lipase 

1S genes in the original diploid cell. The walls of asci are degraded by treatment with zymolase to liberate and allow the 
mixing of the individual ascospores. This mixture is then grown under conditions that promote the mating of the a and 
a haploid cells. It is important to liberate the individual ascospores, since mating will otherwise occur between the 
ascospores within an ascus. Mixing of the haploid cells allows recombination between more than two lipase genes, 
enabling "pootwise recombination." Mating brings together new combinations of chimeric genes that can then undergo 

20 recombination upon sporulation. The cells are recursively cycled through sporulation, ascospore mixing, and mating 
until sufficient diversity has been generated by the recursive pairwise recombination of the five lipase genes. The individual 
chimeric lipase genes either can be screened directly in the haploid yeast cells or transferred to an appropriate expression 
host. 

[0261] The process is described above for lipases and yeast; however, any sexual organisms into which genes can 
25 be directed can be employed, and any genes, of course, could be substituted for lipases. This process is analogous to 
the method of shuffling whole genomes by recursive pairwise mating. The diversity, however, in the whole genome case 
is distributed throughout the host genome rather than localized to specific loci. 

12. Use of YACs to Clone Unlinked Genes 

30 

[0262] Shuffling of YACs is particularly amenable to transfer of unlinked but functionally related genes from one species 
to another, particularly where such genes have not been identified. Such is the case for several commercially important 
natural products, such as taxol. Transfer of the genes in the metabolic pathway to a different organism is often desirable 
because organisms naturally producing such compounds are not well suited for mass culturtng. 

35 [0263] Clusters of such genes can be isolated by cloning a total genomic library of DNA from an organisms producing 
a useful compound into a YAG library. The YAC library is then transformed into yeast. The yeast is sporulated and mated 
such that recombination occurs between YACs and/or between YACs and natural yeast chromosomes. Selection/screen- 
ing is then performed for expression of the desired collection of genes. If the genes encode a biosynthetic pathway, 
expression can be detected from the appearance of product of the pathway. Production of individual enzymes in the 

40 pathway, or intermediates of the final expression product or capacity of celts to metabolize such intermediates indicates 
partial acquisition of the synthetic pathway. The original library or a different library can be introduced into cells surviving/ 
selection screening, and further rounds of recombination and selection/screening can be performed until the end product 
of the desired metabolic pathway is produced. 

45 13. YAC-YAG Shuffling 

[0264] If a phenotype of interest can be isolated to a single stretch of genomic DNA less than 2 megabases in length, 
it can be cloned into a YAC and replicated in S. cerevisiae. The cloning of similar stretches of DNA from related hosts 
into an identical YAC results in a population of yeast cells each harboring a YAC having a homologous insert effecting 
50 a desired phenotype. The recursive breeding of these yeast cells allows the homologous regions of these YACs to 
recombine during meiosis, allowing genes, pathways, and clusters to recombine during each cycle of meiosis. After 
several cycles of mating and segregation, the YAC inserts are well shuffled. The now very diverse yeast library could 
then be screened for phenotypic improvements resulting from the shuffling of the YAC inserts. 

55 14. YAC-Chromosome Shuffling 

[0265] "Mitotic" recombination occurs during cell division and results from the recombination of genes during replication. 
This type of recombination is not limited to that between sister chromatids and can be enhanced by agents that induce 
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recombination machinery, such as nicking chemicals and ultraviolet irradiation. Since it is often difficult lo directly mate 
across a species barrier, it is possible to induce the recombination of homologous genes originating from different species 
by providing the target genes to a desired host organism as a YAC library. The genes harbored in this library are then 
induced to recombine with homologous genes on the host chromosome by enhanced mitotic recombination. This process 
5 is carried out recursively to generate a library of diverse organisms and then screened for those having the desired 
phenotypic improvements. The improved subpopulation is then mated recursively as above to identify new strains having 
accumulated multiple useful genetic alterations. 

15. Accumulation of Multiple YACs Harboring Useful Genes 

10 

[0266] The accumulation of multiple unlinl<ed genes that are required for the acquisition or improvement of a given 
phenotype can be accomplished by the shuffling of YAC libraries. Genomic DNA from organisms having desired phe- 
notypes, such as ethanot tolerance, thermotolerance, and the ability to ferment pentose sugars are pooled, fragmented 
and cloned into several different YAC vectors, each having a different selective marker (his, ura, ade, etc). S. cerevisiae 

fs are transformed with these libraries, and selected for their presence (using selective media i.e uracil dropout media for 
the YAC containing the Ura3 selective marker) and then screened for having acquired or improved a desired phenotype. 
Surviving cells are pooled, mated recursively, and selected for the accumulation of multiple YACs (by propagation in 
medium with multiple nutritional dropouts). Cells that acquire multiple YACs harboring useful genomic inserts are identified 
by further screening. Optimized strains can be used directly, however, due to the burden a YAC may pose to a cell, the 

20 relevant YAC inserts can be minimized, subcloned, and recombined into the host chromosome, to generate a more 
stable production strain. 

16. Choice of Host SSF Organism 

25 [0267] One example use for the present invention is to create an improved yeast for the production of ethanol from 
lignocellulosic biomass. Specifically, a yeast strain with improved ethanol tolerance and thermostability/thermotolerance 
is desirable. Parent yeast strains known for good behavior in a Simultaneous Saccharification and Fermentation (SSF) 
process are identified. These strains are combined with others known to possess ethanol tolerance and/orthermostability. 
[0268] S. cerevisiae is highly amenable to development for optimized SSF processes. It inherently possesses several 

30 traits for this use, including the ability to import and ferment a variety of sugars such as sucrose, glucose, galactose, 
maltose and maltriose. Also, yeast has the capability to flocculate, enabling recovery of the yeast biomass at the end of 
a fermentation cycle, and allowing its re-use in subsequent bioprocesses. This is an important property in that it optimizes 
the use of nutrients in the growth medium. S. cerevisiae is also highly amenable to laboratory manipulation, has highly 
characterized genetics and possesses a sexual reproductive cycle. S. cerevisiae may be grown under either aerobic or 

35 anaerobic conditions, in contrast to some other potential SSF organisms that are strict anaerobes (e.g. Clostridium spp.), 
making them very difficult to handle In the laboratory. S. cerevisiae are also "generally regarded as safe" ("GRAS**), and, 
due to its widespread use for the production of important comestibles for the general public (e.g. beer, wine, bread, ietc), 
is generally familiar and well known. S. cerevisiae is commonly used in fermentative processes, and the familiarity in 
its handling by fermentation experts eases the introduction of novel improved yeast strains into the industrial setting. 

40 [0269] S. cerevisiae strains that previously have been identified as particularly good SSF organisms, for example, 5. 
cerevisiae DsA (ATCC200062) (South CR and Lynd LR, (1994) Appl. Biochem. Biotechnol. 45/46: 467-481; Ranatunga 
TD et al. (1997) Biotechnol. Lett. 19:1125-1127) can be used for starting materials. In addition,. other Industrially used 
S. cere v/s/ae strains are optionally used as host strains, particularly those showing desirable fermentative characteristics, 
such as S. cereWs/ae Y567 (ATCC24858) (Sitton OC et al. (1979) Process Biochem. 14(9): 7-10; Sitton OC et al. (1981) 

45 Adv. Biotechnol. 2: 231-237; McMurrough I et al. (1971) Folia Microbiol. 16: 346-349) and S. cerevisiae ACA 174 (ATCC 
60868) (Benitez T et al. (1983) Appl. Environ. Microbiol. 45: 1429-1436; Chem. Eng. J. 50: B17-B22, 1992), which have 
been shown to have desirable traits for large- scale fermentation. 

17. Choice of Ethanol Tolerant Strains 

50 

[0270] Many strains of S. cerevisiae have been isolated from high-ethanol environments, and have survived in the 
ethanol-rich environment by adaptive evolution. For example; strains from Sherry wine aging ("Flor" strains) have evolved 
highly functional mitochondria to enable their survival in a high-ethanol environment. It has been shown that transfer of 
these wine yeast mitochondria to other strains increases the recipient's resistance to high ethanol concentration, as well 
55 as thermotolerance (Jimenez,- J . and Benitez, T ( 1 988) Curr. Genet. 13: 461-469). There are several flor strains deposited 
in the ATCC, for example S. cerews/ae MY91 (ATCC 201301), MY138 (ATCC 201302), C5 (ATCC 201298), ET7 (ATCC 
201299), 1-A6 (ATCC 201300). 0SB21 (ATCC 241303). F23 (S. globosus ATCC 90920). Also, several flor strains of S. 
uvarum an6 Torulaspora pretoriensis have been deposited. Other ethahol-tolerant wine strains include S. cerevisiae 
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ACA 174 (ATCC 60868), 15% ethanol, and S. cerevisiae ASA (ATCC 90921), isotated from wine containing 18% (v/v) 
ethanol, and NRCC 202036 (ATCC 46534), also a wine yeast. Other S. cerew's/aeethanologens that additionally exhibit 
enhanced ethanot tolerance include ATCC 24858, ATCC 24858, G 3706 (ATCC 42594), NRRL Y-265 (ATCC 60593), 
and ATCC 24845 - ATCC 24860. A strain of S. pastorianus (S. carlsbergensis ATCC 2345) has high ethanol-tolerance 
5 (13% v/v). S. cerevisiae Sa28 (ATCC 26603), from Jamaican cane juice sample, produces high levels of alcohol from 
molasses, is sugar tolerant, and produces ethanol from wood acid hydrolyzate. 

[0271] Several of the listed strains, as well as additional strains can be used as starting materials for breeding ethanol 
tolerance. 

10 18. Choice of Temperature Tolerant Strains 

[0272] A few temperature tolerant strains have been reported, including the highly flocculent strain S. pastorianus SA 
23 ($. carlsbergensis ATCC 26602), which produces ethanol at elevated temperatures, and S. cerews/ae Kyokai 7 (S. 
sake, ATCC 26422), a sake yeast tolerant to brief heat and oxidative streiss. Ballesteros et al ((1991) Appl. Biochem. 

15 Biotechnol. 28/29: 307-315) examined 27 strains of yeast for their ability to grow and ferment glucose in the 32-45''C 
temperature range, including Saccharomyces, Kluyveromyces and Candida sp^. Of these, the best thermotolerant 
clones were Kluyveromyces marxiahus LG and Kluyyerdmyces fragilis 2671 (Ballesteros et al (1993) Appl. Biochem. 
Biotechnol. 39/40: 201-21 1 ). S. cerews/ae-pre/onens/sFDHI was somewhat thermotolerant, howeverwas poor in ethanol 
tolerance. Recursive recombination of this strain with others that display ethanol tolerance can be used to acquire the 

20 thermotolerant characteristics of the strain in progeny which also display ethanol tolerance. 

[0273] Candida acidothermophHum (issatchenkia onentalis, ATCC 20381) is a good SSF strain that also exhibits 
improved performance in ethanol production from lignocetlulosic biomass at higher SSF temperatures than S. cerevisiae 
DgA (Kadam, KL, Schmidt, SL (1997) Appl. Microbiol. Biotechnol. 48: 709-713). This strain can also be a genetic 
contributor to an improved SSF strain. 

25 

19. Shuffling of Strains 

[0274] In those instances where strains are highly related, a recursive mating strategy may be pursued. For example, 
a population of haploid S. cerevisiae (a and alpha) are mutagenized and screened for improved EtOH or thermal tolerance. 

30 The improved haploid subpopulation are mixed together and mated as a pool and induced to sporulate. The resulting 
haploid spores are freed by degrading the asci wall and mixed. The freed spores are then induced to mate and sporulate 
recursively. This process is repeated a sufficient number of times to generate all possible mutant combinations. The 
whole genome shuffled population (haploid) is then screened for further EtOH or thermal tolerance. 
[0275] When strains are not sufficiently related for recursive mating, formats based on protoplast fusion may be 

35 employed. Recursive and poolwise protoplast fusion can be performed to generate chimeric populations of diverse 
parental strains. The resultant pool of progeny is selected and screened to identify improved ethanol and thermal tolerant 
strains. 

[0276] Alternatively, a YAC-based Whole Genome Shuffling format can be used. In this format, YACs are used to 
shuttle large chromosomal fragments between strains. As detailed earlier, recombination occurs between YACs or 

^0 between YACs, and the host chromosomes. Genomic DNA from organisms having desired phenotypes are pooled, 
fragmented and cloned into several different YAC vectors, each having a different selective marker (his, ura, ade, etc). 
S. cerevisiae are transformed with these libraries, and selected for their presence (using selective media, i.e. uracil 
dropout media for the YAC containing the Ura3 selective marker) and then screened for having acquired or improved a 
desired phenotype. Surviving cells are pooled, mated recursively (as above), and selected for the accumulation of 

45 multiple YACs (by propagation in medium with multiple nutritional dropouts). Cells that acquire multiple YACs harboring 
useful genomic inserts are identified by further screening (see below). 

20. Selection for Improved Strains 

50 [0277] Having produced large libraries of novel strains by mutagenesis and recombination, a first task is to isolate 
those strains that possess improvements in the desired phenotypes. Identification of the organism libraries is facilitated 
where the desired key traits are selectable phenotypes. For example, ethanol has different effects on the growth rate of 
a yeast population, viability, and fermentation rate. Inhibition of cell growth and viability increases with ethanol concen- 
tration, but high fermentative capacity is only inhibited at higher ethanol concentrations. Hence, selection of growing 

55 cells In ethanot is a viable approach to isolate ethanol-tolerant strains. Subsequently, the selected strains may be analyzed 
for their fermentative capacity to produce ethanol. Provided that growth and media conditions are the same for all strains 
(parents and progeny), a hierarchy of ethanol tolerance may be constructed. 

[0278] Simple selection schemes for identification of thermal tolerant and ethanol tolerant strains are available and, 
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in this case, are based on those previously designed to identify potentially useful SSF strains. Selection of ethanol 
tolerance is performed by exposing the population to ethanol, then plating the population and looking for growth. Colonies 
capable of growing after exposure to ethanol can be re-exposed to a higher concentration of ethanol and the cycle 
repeated until the most tolerant strains are selected. In order to discern strains possessing heritable ethanol tolerance 
5 from with temporarily acquired adaptations, these cycles may be punctuated with cycles of growth in the absence of 
selection (e.g. no ethanol). 

[0279] Alternatively, the mixed population can be grown directly at increasing concentrations of ethanol, and the most 
tolerant strains enriched (Aguilera and Benitez, 1986, Arch Microbiol 4:337-44). For example this enrichment could be 
carried out in a chemostat or turbidostat. Similar selections can be developed for thermal tolerance, in which strains are 

10 identified by their ability to grow after a heat treatment, or directly for growth at elevated temperatures (Ballesteros et 
al., 1991, Applied Biochem and Biotech, 28:307-315). The best strains identified by these selections will be assayed 
more thoroughly in subsequent screens for ethanol, thermal tolerance or other properties of interest. 
[0280] In one aspect, organisms having increased ethanol tolerance are selected for. A population of natural S. 
cerew'sae isolates are mutagenized. This population is then grown under fermentor conditions under low initial ethanol 

15 concentrations. Once the culture has reached saturation, the culture is diluted into fresh medium having a slightly higher 
ethanol content. This process of successive dilution into medium of incrementally increasing ethanol concentration is 
continued until a threshold of ethanol tolerance is reached. The surviving mutant population having the highest ethanol 
tolerance are then pooled and their genomes recombined by any method noted herein. Enrichment could also be achieved 
by a continues culture in a chemostat or turbidostat in which temperature or ethanol concentrations are progressively 

20 elevated. The resulting shuffled population are then exposed once again to the enrichment strategy but at a higher 
starting medium ethanol concentration. This strategy is optionally applied for the enrichment of thermotolerant cells and 
for the enrichment of cells having combined thermo- and ethanol tolerance. 

21. Screening for Improved Strains 

25 

[0281] Strains showing viability in initial selections are assayed more quantitatively for improvements in the desired 
properties before being reshuffled with other strains; 

[0282] Progeny resulting from mutagenesis of a strain, or those pre-selected for their ethanol tolerance and/or ther- 
mostability, can be plated on non-selective agar. Colonies can be picked roboticalty Into microtiter dishes and grown. 
30 Cultures are replicated to fresh microtiter plates, and the replicates are incubated under the appropriate stress condition 
(s). The growth or metabolic activity of individual clones may be monitored and ranked. Indicators of viability can range 
from the size of growing colonies on solid media, density of growing cultures, or color change of a metabolic activity 
indicator added to liquid media. Strains that show the greatest viability are then mixed and shuffled, and the resulting 
progeny are rescreened under more stringent conditions 

35 

22. Development of an Ethanologen Capable of Converting Cellulose to Ethanol 

[0283] Once a strain of yeast exhibiting thermotolerance and ethanol tolerance is developed, the degradation of 
cellulose to monomeric sugars is provided by the inclusion to the host strain of an efficient cellulase degradation pathway. 
40 [0284] Additional desirable characteristic can be useful to enhance the production of ethanol by the host. For example, 
inclusion of heterologous enzymes and pathways that broaden the substrate sugar range may be performed. "Tuning" 
of the strain can be accomplished by the addition of various other traits, or the restoration of certain endogenous traits 
that are desirable, but lost during the recombination procedures. 

45 23. Conferring of Cellulase Activity 

[0285] A vast number of cellulases and cellulase' degradation systems have been characterized from fungi, bacteria 
and yeast (see reviews by Beguln. P and Aubert, J-P (1994) FEMS Microbiol. Rev. 13: 25-58; Ohima, K. et al. (1997) 
Biotechnol. Genet. Eng. Rev. 14: 365-414). An enzymatic pathway required for efficient sacchahftcation of cellulose 

50 involves the synergistic action of endoglucanases (endo-1 ,4-p-D-glucanases, EC 3.2.1 .4), exocellobiohydrolases (exo- 
1,4-p-D-glucanases, EC.3.2.1.91), and (J-glucosidases (celldbiases, 1 ,4-p-D-glucanases EC 3.2.1.21) (Fig. 9). The 
heterologous production of cellulase enzymes in the ethanologen would enable the sacchahftcation of cellulose, pro- 
ducing monomeric sugars that may be used by the organism for ethanol production. There are several advantages to 
the heterologous expression of a functional cellulase pathway in the ethanologen. For example, the SSF process would 

55 eliminate the need for a separate bioprocess step for sacchahficatibn, and would ameliorate end-product inhibition of 
cellulase enzymes by accumulated intermediate and product sugars. 

[0286] Naturally occurring cellulase pathways are inserted into the ethanologen, or one may choose to use custom 
improved "hybrid" cellulase pathways, employing the coordinate action of cellulases derived from different natural sourc- 
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es, including thermophiles. 

[0287] Several cellulases from non-Saccharomyces have been produced and secreted from this organism success- 
fully, including bacterial, fungal, and yeast enzymes, for example 7*. reeseiCBH I ((Shoemaker (1994), in "The Cellulase 
System of Trichoderma reesei: Trichoderma strain improvement and Expression of Trichoderma cellulases in Yeast," 

5 Online, Pinner, UK, 593-600). It is possible to employ straightforward metabolic engineering techniques to engender 
cellulase activity in Saccharomyces. Also, yeast have been forced to acquire elements of cellulose degradation pathways 
by protoplast fusion (e.g. intergeneric hybrids of Saccharomyces cerevisiae and Zygosaccharomyces fermentatl, a 
cellobiase-producing yeast, have been created (Pina A, et. al. (1 986) Appl. Environ. Microbiol. 51 : 995-1 003). In general, 
any cellulase component enzyme that derives from a closely related yeast organism could be transferred by protoplast 

10 fusion. Cellobiases produced by a somewhat breeder range of yeast may be accessed by whole genome shuffling in 
one of its many formats (e.g. whole, fragmented, YAC-based). 

[0288] Optimally, the cellulase enzymes to be used should exhibit good synergy, an appropriate level of expression 
and secretion from the host, good specific activity (i.e. resistance to host degradation factors and enzyme modification) 
and stability in the desired SSF environment. An example of a hybrid cellulose degradation pathway having excellent 
15 synergy includes the following enzymes: CBH I exocellobiohydrolase of Trichoderma reesei. the Acidothermus cellulo- 
iyticus E1 endoglucanase, and the Thermomonosper'a fusca E3 exocellulase (Baker, et. al. (1998) Appl. Biochem. 
BiotechnoL 70-72: 395-403). * 

[0289] It is suggested here that these enzymes (or improved mutants thereof) be considered for use in the SSF 
organism, along with a cellobiase (p-glucosidase), such as that from Candida peltata. Other possible cellulase systems 
20 to be considered should possess particularly good activity against crystalline cellulose, such as the T, reese/ cellulase 
system (Teeri, TT, et. al. (1998) Biochem. Soc. Trans. 26: 173-178), or possess particularly good thermostability char- 
acteristics (e.g. cellulase systems from thermophilic organisms, such as Thermomonospora fusca (Zhang, S., et. al. 
(1995) Biochem. 34: 3386-335). 

[0290] A rational approach to the cloning of cellulases in the ethanologenic yeast host could be used. For example, 
25 known cellulase genes are cloned into expression cassettes utilizing S. cerevisiae promoter sequences, and the resultant 
linear fragments of DNA may be transformed into the recipient host by placing short yeast sequences at the termini to 
encourage site-specific integration into the genome. This is preferred to plasmidic transformation for reasons of genetic 
stability and maintenance of the transforming DNA. 

[0291] If an entire cellulose degradative pathway were introduced, a selection could be implemented in an agar-plate- 
30 based format, and a large number of clones could be assayed for cellulase activity in a short period of time. For example, 
selection for an exocellulase may be accessible by providing a soluble oligocellulose substrate or carboxymethylcellulose 
(CMC) as a sole carbon source to the host, otherwise unable to grow on agar containing this sole carbon source. Clones 
producing active cellulase pathways would grow by virtue of their ability to produce glucose. 

[0292] Alternatively, if the different cellulases were to be introduced sequentially, it would be useful to first introduce 
35 a cellobiase, enabling a selection using commercially available cellobiose as a sole carbon source. Several strains of 
S. cerevisiae that are able to grow on cellobiose have been created by introduction of a cellobiase gene (e.g. Rajoka 
Ml, et. al. (1998) Floia Microbiol. (Praha) 43. 129-135; Skory, CD, et. al. (1996) Curr. Genet. 30, 417-422; D'Auria, S, 
et. al. (1996) Appl. Biochem. Biotechnol. 61, 157-166; Adarti, AC. et. al. (1995) Yeast 11, 395-406; Adam, AC (1991) 
Curr. Genet. 20, 5-8). 

40 [0293] Subsequent transformation of this organism with CBHI exocellulase can be selected for by growth on a cellulose 
substrate such as carboxymethylcellulose (CMC). Finally, addition of an endoglucanase creates a yeast strain with 
improved crystalline degradation capacity. 

24. Conferring of Pentose Sugar Utilization 

45 

[0294] . Inclusion of pentose sugar utilization palhwayis is an important facet to a potentially useful SSF organism. The 
successful expression of xylose sugar utilization pathways for ethanol production has been reported in Saccharomyces 
(e.g. Chen, ZD and Ho, NWY (1993) Appl. Biochem. Biotechnol. 39/40 135-147). 

[0295] It would also be useful to accomplish L-arabinose substrate utilization for ethanol production in the Saccharo- 
50 myces host. Yeast strains that utilize L-arabinbse include some Candida and Pichia spp. (McMillan JD and Boynton BL 
(1994) Appl. Biochem. Biotechnol. 45-46: 569-584; Dien BS, et al. (1996) Appl. Biochem. Biotechnol. 57-58: 233-242). 
Genes necessary for arabinose fermentation in E, coli could also be inroduced by rational means (e.g. as has been 
performed previously in Z. mot>///s (Deanda K, et. al. (1996) Appl. Environ. Microbiol. 62: 4465-4470)) 

55 25. Conferring of Other Useful Activities 

[0296] Several other traits that are important for optimization of an SSF strain have been shown to be transferable to 
S. cerevisiae. Like thermal tolerance, cellulase activity and pentose sugar utilization, these traits may not normally be 
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exhibited by Saccharomyces (or the particular strain of Saccharomyces being used as a host), and may be added by 
genetic means. For example, expression of human muscle acylphosphatase in S. cerevisiae has been suggested to 
increase ethanol production (Rougei, G., et. al. (1996) Biotechnol. App. Biochem. 23: 273-278). 
[0297] It can occur that evolved stress-tolerant SSF strain acquire some undesirable mutations in the course of the 

5 evolution strategy. Indeed, this is a pervasive problem in strain improvement strategies that rely on mutagenesis tech- 
niques, and can result in highly unstable or fragile production strains. It is possible to restore some of these desirable 
traits by rational methods such as cloning of specific genes that have been knocked out or negatively influenced in the 
previous rounds of strain improvement. The advantage to this approach is specificity-the offending.gene may be targeted 
directly. The disadvantage is that it may be time-consuming and repetitious if several genes have been compromised, 

10 and it only addresses problems that have been characterized. A preferred (and more traditional) approach to the removal 
of undesirable/deleterious mutations is to back-cross the evolved strain to a desirable parent strain (e.g. the original 
"host" SSF strain). This strategy has been employed successfully throughout strain improvement where accessible (i.e. 
for organisms that have sexual cycles of reproduction). When lacking the advantage of a sexual process, it has been 
accomplished by using other methods, such as parasexual recombination or protoplast fusion. For example, the ability 

15 to flocculate was confen-ed on a non-flocculating strain of S. cerevisiae by protoplast fusion with a flocculation competent 
S. cerevisiae (Watari, J., et. al (1990) Agric. Biol. Chem. 54: 1677-1681). 

N. IN VITRO WHOLE GENOME SHUFFLING 

20 [0298] The shuffling of large DNA sequences, such as eukaryotic chrmosomes, is difficult by prior art in vitro shuffling 
methods. A method for overcoming this limitation is described herein. 

[0299] The cells of related eukaryotic species are gently lysed and the intact chromosomes are liberated. The liberated 
chromosomes are then sorted by FACS or similar method (such as pulse field electrophoresis) with chromosomes of 
similar size being sequestered together. Each size fraction of the sorted chromosomes generally will represent a pool 
25 of analogous chromosomes, for example the Y chromosome of related mammals. The i goal is to isolate intact chromo- 
somes that have not been irreversibly damaged. 

[0300] The fragmentation and reassembly of such large complex pieces of DNA employing DNA polymerases is 
difficult and would likely introduce an unacceptably high level of random mutations. An alternative approach that employs 
restriction enzymes and DNA ligase provides a feasible less destructive solution. A chromosomal fraction is digested 
30 with one or more restriction enzymes that recognize long DNA sequences ("-15-20bp), such as the intron and intein 
encoded endonucleases (l-Ppo I, \-Ceu I, Pl-Psp I, Pl-T// 1, Pl-Sce I (VDE). These enzymes each cut, at most, a few 
times within each chromosome, resulting in a combinatorial mixture of large fragments, each having overhanging single 
stranded termini that are compleinentary to other sites cleaved by the same enzyme. 

[0301] The digest is further modified by very short incubation with a single stranded exonuclease. The polarity of the 
35 nuclease chosen is dependent on the single stranded overhang resulting from the restriction enzyme chosen. 5-3' 
exonuclease for 3'-overhangs, and 3'-5'- exonuclease for 5'overhangs. This digestion results in significantly long regions 
of ssDNA overhang on each dsDNA termini. The purpose of this incubation is to generate regions of DNA that define 
specific regions of DNA where recombination can occur. The fragments are then incubated under condition where the 
ends of the fragments anneal with other fragments having homologous ssDNA termini. Often, the two fragments annealing 
40 will have originated from different chromosomes and in the presence of DNA ligase are covalently linked to form a 
chimeric chromosome. This generates genetic diversity mimicking the crossing over of homologous chromosomes. The 
complete ligation reaction will contain a combinatorial mixture of all possible ligations of fragments having homologous 
overhanging termini. A subset of this population will be complete chimeric chromosomes. 

[0302] To screen the shuffled library, the chromosomes are delivered to a suitable host in a manner allowing for the 
■^5 uptake and expression of entire chrorhosomes. For example, YACs (yeast artificial chromosomes) can be delivered to 
eukaryotic cells by protoplast fusion. Thus, the shuffle library could be encapsulated in liposomes and fused with pro- 
toplasts of the appropriate host cell. The resulting transformants would be propagated and screened for the desired 
cellular improvements. Once an improved population was identified, the chromosomes would be Isolated, shuffled, and 
screened recursively. 

50 

O. WHOLE GENOME SHUFFLING OF NATURALLY COMPETENT MICROORGANISMS 

[0303] Natural competence is a phenomenon observed for some microbial species whereby individual cells take up 
DNA from the environment and incorporate it into their genome by homologous recombination. Bacillus subtilis and 
55 Acetinetobacter spp. are known to be particularly efficient at this process. A method for the whole genome shuffling 
(WGS) of these and analogous organisms is described employing this process. 

[0304] One goal of whole genome shuffling is the rapid accumulation of useful mutations from a population of individual 
strains into one superior strain. If the organisms to be evolved are naturally compiBtent, then a split pooled strategy for 
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the recursive transformation of naturally competent cells with DNA originating from the pool will effect this process. An 
example procedure is as follows. 

[0305] A population of naturally competent organisms that demonstrates a variety of useful traits {such as increased 
protein secretion) is identified. The strains are pooled, and the pool is split. One half of the pooj is used as a source of 

5 gDIMA. while the other is used to generate a pool of naturally competent cells. 

[0306] The competent cells are grown in the presence of the pooled gDNA to allow DNA uptake and recombination. 
Cells of one genotype uptake and incorporate gDNA from cells of a different type generating cells having chimeric 
genomes. The result is a population of cells representing a combinatorial mixture of the genetic variations originating in 
the original pool. These cells are pooled again and transformed with the same source of DNA again. This process is 

10 carried out recursively to increase the diversity of the genomes of cells, resulting from transfonmation. Once sufficient 
diversity has been generated, the cell population is screened for new chimeric organisms demonstrating desired im- 
provements. 

[0307] This process is enhanced by increasing the natural competence of the host organism. COMS is a protein that, 
when expressed in B. subtilis, enhances the efficiency of natural competence mediated transformation more than an 
15 order of magnitude. 

[0308] It was demonstrated that approximately 100% of the.cells harboring the plasmid pCOMS uptake and recombine 
genomic DNA fragments into their genomes. In general, approximately 10% of the genome is recombined into any given 
transformed cell. This observation was demonstrated by the following. 

[0309] A strain of B. subtilis pCOMS auxotrophic for two nutritional markers was transformed with genomic DNA 
20 (gDNA) isolated from a prototrophic strain of the same organism. 10% of the cells exposed to the DNA were prototrophic 
for one of the two nutrient markers. The average size of the DNA strand taken up by S. subtHis is approximately 50kb 
or -2% of the genome. Thus 1 of every ten cells had recombined a marker that was represented 1 in every fifty molecules 
of uptaken gDNA. Thus, most of the cells take up and recombine with approximately five 50kb molecules or 10% of the 
genome. This method represents a powerful tool for rapidly and efficiently recombining whole microbial genomes. 
25 [0310] In the absence of pCOMS, only 0.3% of the cells prepared for natural competency uptake and integrate a 
specific marker. This suggested that about 15% of the cells actually undenwent recombination with a single genomic 
fragment. Thus, a recursive transformation strategy as described above produces a whole genome shuffled library, even 
in the absence of pCOMS. In the absence of pCOMS, however, the complex genomes will represent a smaller, but still 
screenable percentage of the transformed or shuffled population. 

30 

P. CONGRESSION 

[031 1] Congression is the integration of two independent unlinked markers into a cell. 0.3% of naturally competent B. 
subtilis cells integrate a single marker (described above). Of these, about 10% have taken up an additional marker. 

35 Thus, if one selects or screens for the integration of one specific marker, 10% of the resulting population will have 
integrated another specific marker. This provides a way of enriching for specific integration events. 
[0312] For example, if one is looking for the integration of a gene for which there is no easy screen or selection, it will 
exist as 0.3% of the cell population. If the population is first selected for a specific integration event, then the desired 
integration will be found in 10% of the population. This represents a significant (-30-fold) enrichment for the desired 

40 event. This enrichment is defines as the "congression effect." The congession effect is not influenced by the presence 
of pCOMS, thus the "pCOMS effect'* is simply to increase the percentage of naturally competent cells that are truly 
naturally competent from about 15% in its absence to 100% in its presence. All competent celts sliti uptake about the 
same amount of DNA or -10% of the Bacillus genome. 

[031 3] The congression effect can be used in the following examples to enhance whole genome shuffling as well, as 
45 the targeted integration of shuffled genes to the chromosome. 

Q. B. SUBTILIS SHUFFLING 

[0314] A population of B. subtilis cells having desired properties are identified, pooled and shuffled as described above 
50 with one exception: once the pooled population is split, half of the population is transformed with an antibiotic selection 
marker that is flanked by sequence that targets its integration and disruption of a specific nutritional gene, for example, 
one involved in amino biosynthesis. Transformants resistant to the drug are auxotrophic for that nutrient. The resistant 
population is pooled and grown under conditions rendering them naturally competent (or optionally first transformed with 
pCOMS). 

55 [0315] The competent cells are then transformed with gDNA isolated from the original pool, and prototrophs are 
selected. The prototrophic population will have undergone recombination with genomic fragments encoding a functional 
copy of the nutritional marker, and thus will be enriched for cells having undergone recombination at other genetic loci 
by the congression effect. 
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R. TARGETING OF GENES AND GENE LIBRARIES TO THE CHROMOSOME 



[0316] It Is useful to be able to efficiently deliver genes or gene libraries directly to a specific location in a cells 
chromosome. As above, target cells are transformed with a positive selection marker flanked by sequences that target 

5 its homologous recombination into the chromosome. Selected cells harboring the marker are made naturally competent 
(with or without pCOMS, but preferably the former) and transformed with a mixture of two sets of DNA fragments. The 
first set contains a gene or a shuffled library of genes each flanked with sequence to target its integration to a specific 
chromosomal loci. The second set contains a positive selection marker (different from that first integrated into the cells) 
flanked by sequence that will target its integration and replacement of the first positive selection marker. Under optimal 

fo conditions, the mixture is such that the gene or gene library is in molar excess over the positive selection marker. 
Transformants are then selected for cells containing the new positive marker. These cells are enriched for cells having 
integrated a copy of the desired gene or gene library by the congressipn effect and can be directly screened for cells 
harboring the gene or gene variants of interest! This process was carried out using PGR fragments <10kb, and it was 
found that, employing the congression effect, a population can be enriched such that 50% of the cells are congregants. 

f5 Thus, one in two cells contained a gene or gene variant. 

[0317] Alternatively, the expression host can be absent of the first positive selection marker, and the competent cells 
are transformed with a mixture of the target genes and a limiting amount of the first positive selection marker fragment. 
Cells selected for the positive marker are screened for the desired properties in the targeted genes. The improved genes 
are amplified by the PCR, shuffled again, and then returned to the original host again with the first positive selection 

20 marker. This process is carried out recursively until the desired function of the genes are obtained. This process obviates 
the need to construct a primary host strain and the need for two positive markers. 

S. CONJUGATION-MEDIATED GENETIC EXCHANGE 

25 [0318] Conjugation can be employed in the evolution of cell genomes in several ways. Conjugative transfer of DNA 
occurs during contact between cells. See Guiney (1993) in: Bacterial Conjugation (Clewell, ed., Plenum Press, New 
York), pp. 75-104; Reimmann & Haas in Bacterial Conjugation (Clewell, ed., Plenum Press, New York 1993), at pp. 
137-188 (incorporated by reference in their entirety for all purposes). Conjugation occurs between many types of gram 
negative bacteria, and some types of gram positive bacteria. Conjugative transfer is also known between bacteria and 

30 plant cells (Agrobacterium tumefaciens) or yeast. As discussed in patent 5,837 ,458, the genes responsible for conjugative 
transfer can themselves be evolved to expand the range of cell types (e.g., from bacteria to mammals) between which 
such transfer can occur. 

[0319] Conjugative transfer is effected by an origin of transfer (oriT) and flanking genes (MOB A, B and C), and 15-25 
genes, termed tra, encoding the structures and enzymes necessary for conjugation to occur. The transfer origin is defined 

35 as the site required in cis for DNA transfer. Tra genes include tra A, B, C, D, E, F, G, H, I, J, K, L, M, N, P, Q, R, S, T, 
U, V, W, X, Y, Z, vir AB (alleles 1-11). C, D. E, G, IHF, and FinOP. Tra genes can be expressed in cis or trans to oriT. 
Other cellular enzymes, including those of the RecBCD pathway, RecA, SSB protein, DNA gyrase, DNA poll, and DNA 
ligase, are also involved in conjugative transfer. RecE or recF pathways can substitute for RecBCD. 
[0320] One structural protein encoded by. a tra gene is the sex pilus, a filament constructed of an aggregate of a single 

40 polypeptide protruding from the cell surface. The sex pilus binds to a polysaccharide on recipient cells and forms a 
conjugative bridge through which DNA can transfer. This process activates a site-specific nuclease encoded by a MOB 
gene, which specifically cleaves DNA to be transferred at oriT. The cleaved DNA is then threaded through the conjugation 
bridge by the actiori of other tra enzymes. : 

[0321] Mobilizable vectors can exist in episomal form or integrated into the chromosome. Episomal mobilizable vectors 
45 can be used to exchange fragments inserted into the vectors between cells. Integrated mobilizable vectors can be used 
to mobilize adjacent genes from the chromosome. 

T. USE OF INTEGRATED MOBILIZABLE VECTORS TO PROMOTE EXCHANGE OF GENOMIC DNA 

50 [0322] The F plasmid of E. coli integrates into the chromosome at high frequency and mobilizes genes unidirectional 
from the site of integration (Clewell, 1993, supra; Firth et al., in Escherichia coli and Salmonella Cellular and Molecular 
Biology2, 2377-2401 (1996); Frost et aL. Microbiol. Rev. 58, 162-210 (1994)). Other mobilizable vectors do not spon- 
taneously integrate into a host chromosome at high efficiency, but can be induced to do so by growth under particular 
conditions (e.g., treatment with a mutagenic agent, growth at a nonpermissive temperature for plasmid replication). See 

55 Reimann & Haas in Bacterial Conjugation (ed. Clewell, Plenum Press, NY 1993), Ch. 6. Of particular interest is the IncP 
group of conjugal plasmids which are typified by their broad host range (Clewell, 1993. supra. 

[0323] Donor "male" bacteria which bear a chromosomal insertion of a conjugal plasmid, such as the E. coHF factor 
can efficiently donate chromosomal DNA to recipient "female" enteric bacteria which lack F (F). Conjugal transfer from 
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donor to recipient is initiated at oriT. Transfer of the nicked single strand to the recipient occurs in a 5' to 3' direction by 
a rolling circle mechanisms which allows mobilization of tandem chromosomal copies. Upon entering the recipient, the 
donor strand is discontinuously replicated. The linear, single-stranded donor DNA strand is a potent substrate for initiation 
of recA-mediated homologous recombination within the recipient. Recombination between the donor strand and recipient 
5 chromosomes can result in the inheritance of donor traits. Accordingly, strains which bear a chromosomal copy of F are 
designated Hfr (for liigh frequency of recombination) (Low, 1996 in Escherichia coli and Salmonella Cellular and Mo- 
lecular Biology yo\. 2, pp. 2402-2405; Sanderson, in Escherichia coli and Salmonella Cellular and Molecular Biology 
2, 2406-2412(1996)). 

[0324] The ability of strains with integrated mobilizable vector to transfer chromosomal DNA provides a rapid and 
10 efficient means of exchanging genetic material between a population of bacteria thereby allowing combination of positive 
mutations and dilution of negative mutations. Such shuffling methods typically start with a population of strains with an 
integrated mobilizable vector encompassing at least some genetic diversity. The genetic diversity can be the result of 
natural variation, exposure to a mutagenic agent or introduction of a fragment library. The population of cells is cultured 
without selection to allow genetic exchange, recombination and expression of recombinant genes. The cells are then 
15 screened or selected for evolution toward a desired property. The population surviving selection/screening can then be 
subject to a further round of shuffling by HFR-mediated genetic exchange, or otherwise. 

[0325] The natural efficiency of Hfr and other strains with integrated mob vectors as recipients of conjugal transfer 
can be improved by several means. The relatively low recipient efficiency of natural HFR strains is attributable to the 
products of traSand ^ra 7" genes of F (Clewell, 1993, supra; Firth et al., 1996, supra; Frost et al., 1994, stypra; Achtman 

20 et al., J. MoL Biol. 138, 779-795 (1980). These products are localized to the inner and outer membranes of F* strains, 
respectively, where they serve to inhibit redundant matings between two strains which are both capable of donating 
DNA. The effects of traS and traT, and cognate genes in other conjugal plasmids, can be eliminated by use of knockout 
cells incapable of expressing these enzymes or reduced by propagating cells on a carbon-limited source. (Peters et al., 
J. BacterioL, 178, 3037-3043 (1996)). 

25 [0326] In some methods, the starting population of cells has a mobilizable vector integrated at different genomic sites. 
Directional transfer from or/T typically results in more frequent inheritance of traits proximal to oriT. This is because 
mating pairs are fragile and tend to dissociate (particularly when in liquid medium) resulting in the interruption of transfer. 
In a population of cells having a mobilizable vector integrated at different sites, chromosomal exchange occurs in a more 
random fashion. Kits of Hfr strains are available from the E. coli. Genetic Stock Center and the Salmonella Genetic 

30 Stock Centre (Frost et al., 1994, supra). Alternatively, a library of strains with or/Tat random sites and orientations can 
be produced by insertion mutagenesis using a transposon which bears oriT. The use of a transposon bearing an oriT 
[e.g., the Tn5-or/T described by Yakobson EA, et al, J. Bacteriol. 1984 Oct; 160(1): 451-453] provides a quick method 
of generating such a library. Transfer functions for mobilization from the transposon-borne or/T sites are provided by a 
helper vector in trans. It is possible to generate similar genetic constructs using other sequences known to one of skill 

35 as well. 

[0327] In one aspect, a recursive scheme for genomic shuffling using Tn- or/T elements is provided. A prototrophic 
bacterial strain or set of related strains bearing a conjugal plasmid, such as the F fertility factor or a member of the IncP 
group of broad host range plasmids Is mutagenized and screened for the desired properties. Individuals with the desired 
properties are mutagenized with a Tn-or/Telement and screened for acquisition of an auxotrophy (e.g., by replica-plating 

40 to a minimal and complete media) resulting from insertion of the Th-or/Te lenient in any one of many biosynthetic gene 
scattered across the genome. The resulting auxotrophs are pooled and allowed to mate under conditions promoting 
male-to-male matings, e.g., during growth in close proximity on a filter membrane. Note that transfer functions are 
provided by the helper conjugal plasmid present in the original strain set. Recombinant transconjugants are selected on 
minimal medium and screened for further improvement. 

45 [0328] Optionally, strains bearing integrated mobilizable vectors are defective in mismatch repair gene(s). Inheritance 
of donor traits which arise from sequence heterologies increases in strains lacking the methyl-directed mismatch repair 
system. Optionally, the gene products which decrease recombination efficiency can be inhibited by small molecules. 
[0329] Intergenic conjugal transfer between species such as E. coli and Salmonella typhimurium, which are 20% 
divergent at the DNA level, is also possible if the recipient strain is mutH. mutL or mutS (see Rayssiguier et al., Nature 

50 342, 396-401 (1989)). Such transfer can be used to obtain recombination at several points as shown by the following 
example. 

[0330] One example uses an S. typhimurium Hfr donor strain having markers thr557 at map position 0, pyrF2690 at 
33 min, serA13 at 62 min and hfrK5 at 43 min. MutS +/-, F- E. coli recipient strains had markers pyrD68 at 21 min 
aroC355 at 51 min, ilv3164 at 85 min and mutS21 5 at 59 min. The triauxotrophic S. typhimurium Hfr donor and isogenic 
55 mutS+/triauxotrophic E. coli recipient were inoculated into 3 ml of Lb broth and shaken at 37*0 until fully grown. 100 \i\ 
of the donor and each recipient were mixed in 10 ml fresh LB broth, and then deposited to a sterile Millipore 0.45 jjiM 
HA filter using a Nalgene 250 ml reusable filtration device. The donor and recipients alone were similarly diluted and 
deposited to check for reversion. The filters with cells were placed cell-slde-up on the surface of an LB agar plate which 
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was incubated overnight at S^C. The filters were removed with the aid of a sterile forceps and placed in a sterile 50 ml 
tube containing 5 nnl of minimal salts broth. Vigorous vortexing was used to wash the cells from the filters. 100 ft! of 
mating mixtures, as well as donor and recipient controls were spread to LB for viable cell counts and minimal glucose 
supplemented with either two of the three recipient requirements for single recombinant counts, one of the three require- 
ments for double recombinant counts, or none of the three requirements for triple recombinant counts. The plates were 
incubated for 48 hr at 37" after which colonies were counted. 

lAe6\um Supplements Recombinant Genofype Recombinant CFUs/Total CPUs mutS-lmutS- 

mutS" mutS- 



Aro + liv 


pyr^ aro-ilv- - 






Aro + Ura 


pyr aro'ilV 1.2 x 10"® 


2.5 X 10-6 


208 


llv + Ura 


pyr aro^ ilv 2.7x10*^ 


3.0X10-6 


111 


Aro 


pyr"*^ aro- ilv*- 






llv 


pyr^ aro* ilv - 






Ura 


pyr aro+ ilv+ <10-9 


<10-9 




nothing 


pyr"*^ aro* ilv* 







Aro = aromatic amino acids and vitamins 
llv = branched chain amino acids 
Ura = uracil 

[0331] The data indicate that recombinants can be generated at reasonable frequencies using Hfr matings. Intergeneric 
recombination is enhanced 100-200 fold in a recipient that is defective methyl-directed mismatch repair. 
[0332] Frequencies are further enhanced by increasing the ratio of donor to recipient cells, or by repeatedly mating 
the original donor strains with the previously generated recombinant progeny. 

U. INTRODUCTION OF FRAGMENTS BY CONJUGATION 

[0333] Sobilizable vectors can also be used to transfer fragment libraries into cells to be evolved. This approach is 
particularly useful in situations in which the cells to be evolved cannot be efficiently transformed directly with the fragment 
library but can undergo conjugation with primary cells that can be transformed with the fragment library. 
[0334] DNA fragments to be introduced into host cells encompasses diversity relative to the host cell genome. The 
diversity can be the result of natural diversity or mutagenesis. The DNA fragment library is cloned into a mobilizable 
vector having an origin of transfer. Some such vectors also contain mob genes although alternatively these functions 
can also be provided in trans. The vector should be capable of efficient conjugal transfer between primary cells and the 
intended host cells. The vector should also confer a selectable phenotype. This phenotype can be the same as the 
phenotype being evolved or can be conferred by a marker, such as a drug resistance marker. The vector should preferably 
allow self-elimination in the intended host cells thereby allowing selection for cells in which a cloned fragment has 
undergone genetic exchange with a homologous host segment rather than duplication. SucIt"orfft be achieved by use 
of vector lacking an origin of replication functional in the intended host type or inclusion of a negative selection marker 
in the vector. 

[0335] One suitable vector.is the broad host range conjugation plasmid described by Simon et al.. Bid Technology ^, 
784-791 (1983); TrieuCuot et al.. Gene 102. 99-104 (1991); Bierman et al.. Gene 116, 43-49 (1992). These plasmids 
can be transformed into E. coli and then force-mated into bacteria that are difficult or impossible to transform by chemical 
or electrical induction of competence. These plasmids contain the origin of the IncP plasmid, oriT. Mobilization functions 
are supplied in trans by chromosomally-integrated copies of the necessary genes. Conjugal transfer of DNA can in some 
cases be assisted by treatment of the recipient (if gram-positive) with sub-inhibitory concentrations of penicillins (Trieu- 
Cuot etal., 1993 FEMS Microbiol. Lett. 109. 19-23). To increase diversity in populations, recursive conjugal mating prior 
to screening is performed. 

[0336] Cells that have undergone allelic exchange with library fragments can be screened or selected for evolution 
toward a desired phenotype. Subsequent rounds of recombination can be performed by repeating the conjugal transfer 
step, the library of fragments can be fresh or can be obtained from some (but not all) of the cells surviving a previous 
round of selection/screening. Conjugation-mediated shuffling can be combined with other methods of shuffling. 
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V. GENETIC EXCHANGE PROMOTED BY TRANSDUCING PHAGE 



[0337] Phage transduction can include the transfer, from one cell to another, of nonviral genetic material within a viral 
coat (Masters, in Escherichia coii and Salmonella Cellular and Molecular Biology 2, 2421-2442 (1996). Perhaps the 
5 two best examples of generalized transducing phage are bacteriophages PI and P22 of E. coli and S. typhimurium, 
respectively. Generalized transducing bacteriophage particles are formed at a low frequency during lytic infection when 
viral-genome-sized, doubled-stranded fragments of host (which serves as donor) chromosomal DNA are packaged into 
phage heads. Promiscuous high transducing (NT) mutants of bacteriophage P22 which efficiently package DNA with 
little sequence specificity have been isolated. Infection of a susceptible host results in a lysate in which up to 50% of the 
10 phage are transducing particles. Adsorption of the generalized transducing particle to a susceptible recipient cell results 
in the injection of the donor chromosomal fragment. RecA-mediated homologous recombination fotlovying injection of 
the donor fragment can result in the inheritance of donor traits. Another type of phage which achieves quasi random 
insertion of DNA into the host chromospnne is Mu. For an overview of Mu biology, see, Grpisman (1991) in Methods in 
Enzymology v. 204. Mu can generate a variety of chromosomal rearrangements including deletions, inversions, dupli- 
es cations and transpositions. In addition, elements which combine the features of P22 and Mu are available, including 
Mud-P22, which contains the ends of the Mu genome in place of the P22 aff site and /n/ gene. See, Berg, supra. 
[0338] Generalized transducing phage can be used to exchange genetic material between a population of cells en- 
compassing genetic diversity and susceptible to infection by the phage. Genetic diversity can be the result of natural 
variation between cells, induced mutation of cells or the introduction of fragment libraries into cells. DNA is then exchanged 
20 between cells by generalized transduction. If the phage does not cause lysis of cells, the entire population of cells can 
be propagated in the presence of phage. If the phage results in lytic infection, transduction is performed on a split pool 
basis. That is, the starting population of cells is divided into two. One subpopulation is used to prepare transducing 
phage. The transducing phage are then infected into the other subpopulation. Preferably, infection is performed at high 
multiplicity of phage per cell so that few cells remain uninfected. Cells surviving infection are propagated and screened 
25 or selected for evolution toward a desired property. The pool of cells surviving screening/selection can then be shuffled 
by a further round of generalized transduction or by other shuffling methods. Recursive split pool tranduction is optionally 
performed prior to selection to increase the diversity of any population to me screened. 

[0339] The efficiency of the above methods can be increased by reducing infection of cells by infectious (nontransducing 
phage) and by reducing lysogen formation. The former can be achieved by inclusion of chelators of divalent cations, 
30 such as citrate and EGTA in culture media. Tail defective transducing phages can be used to allow only a single round 
of infection. Divalent cations are required for phage absorption and the inclusion of chelating agents therefore provides 
a means of preventing unwanted infection. Integration defective {int) derivatives of generalized transducing phage can 
be used to prevent lysogen formation. In a further variation, host cells with defects in mismatch repair gene(s) can be 
used to increase recombination between transduced DNA and genomic DNA. 

35 

1. Use of Locked in Prophages to Facilitate DNA Shuffling 

[0340] The use of a hybrid, mobile genetic element (locked-in prophages) as a means to facilitate whole genome 
shuffling of organisms using phage transduction as a means to transfer DNA from donor to recipient is a preferred 

40 embodiment. One such element (M,ud-P22) based on the temperate Salmonella phage P22 has been described for use 
in genetic and physical mapping of mutations. See, Youderian et at, (1988) Genetics 118:581-592, and Benson and 
Goldman (1992) J. BacterioL 174(5): 1673-1 681. Individual Mud-P22 insertions package specific regions of the Salmo- 
nella chromosome into phage P22 particles. Libraries of random Mud-P22 insertions can be readily isolated and induced 
to create pools of phage particles packaging random chromosomal DNA fragments. These phage particles can be used 

45 to infect new cells and transfer the DNA from the host into the recipient in the process of transduction. Alternatively, the 
packaged chromosomal DNA can be isolated and rnanipulated further by techniques such as DNA shuffling or any other 
mutagenesis technique prior to being reintroduced into cells (especially recD cells for linear DNA) by transformation or 
electroporation, where they integrate into the chromosome. 

[0341] Either the intact transducing phage particles or isolated DNA can be subjected to a variety of mutagens prior 
50 to reintroduction into cells to enhance the mutation rate. Mutator cell lines such as mutD can also be used for phage 
growth. Either method can be used recursively in a process to create genes or strains with desired properties. £. coli 
cells carrying a cosmid clone of Salmonella LPS genes are infectable by P22 phage. It is possible to develop similar 
genetic elements using other combinations of transposabte elements and bacteriophages or viruses as well. 
[0342] P22 is a lambdoid phage that packages its DNA into preassembled phage particles (heads) by a "headful" 
55 mechanism. Packaging of phage DNA is initiated at a specific site (pac) and proceeds unidirectionally along a linear, 
double stranded normally concatameric molecule. When the phage head is full (--43 kb), the DNA strand is cleaved, and 
packaging of the next phage head is initiated. Locked-in or excision-defective P22 prophages, however, initiate packaging 
at their pac site, and then proceed unidirectionally along the chromosome, packaging successive headfuls of chromo- 
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somal DNA (rather than phage DNA). When these transducing phages infect new Salmonella cells they inject the 
chromosomal DNA from the original host into the recipient cell, where it can recombine into the chromosome by homol- 
ogous recombination creating a chimeric chromosome. Upon infection of recipient cells at a high multiplicity of infection, 
recombination can also occur between incoming transducing fragments prior to recombination into the chromosome. 

5 [0343] Integration of such locked-in P22 prophages at various sites in the chromosome allows flanking regions to be 
amplified and packaged into phage particles, the Mud-P22 mobile genetic element contains an excision-defective P22 
prophage flanked by the ends of phage/transposoh Mu. The entire Mud-P22 element can transpose to virtually any 
location in the chromosome or other episome (eg. F', BAG clone) when the Mu A and B proteins are provided in trans. 
[0344] A number of embodiments for this type of genetic element are available. In one example, the locked in prophage 

"10 are used as generalized transducing phage to transfer random fragments of a donor chromosome into a recipient. The 
Mud-P22 element acts as a transposon when Mu A and B transposase proteins are provided in trans and integrate 
copies of itself at random locations in the chromosome. In this way, a library of random chromosomal Mud-P22 insertions 
can be generated in a suitable host. When the Mud7P22 prophages in this library are induced, random fragments of 
chromosomal DNA will be packaged into phage particles. When these phages infect recipient cells, the chromosomal 

15 DNA is injected and can recombine into the chromosome of the recipient. These recipient cells are screened for a desired 
property and celts showing improvement are then propagated. The process can be repeated, since the Mud-P22 genetic 
element is not transferred to the recipient in this process. Infection at a high multiplicity allows for multiple chromosomal 
fragments to be injected and recombined into the recipient chromosome. 

[0345] Locked in prophages can also be used as specialized transducing phage. Individual insertions near a gene of 
20 interest can be isolated from a random insertion library by a variety of methods. Induction of these specific prophages 
results in packaging of flanking chromosomal DNA including the gene(s) of interest into phage particles. Infection of 
recipient cells with these phages and recombination of the packaged DNA into the chromosome creates chimeric genes 
that can be screened for desired properties. Infection at a high multiplicity of infection can allow recombination between 
incoming transducing fragments prior to recombination into the chromosome. 
25 [0346] These specialized transducing phage can also be used to isolate large quantities of high quality DNA containing 
specific genes of interest without any prior knowledge of the DNA sequence. Cloning of sjaecific genes is not required. 
Insertion of such an element nearby a biosynthetic operon for example allows for large amounts of DNA from that operon 
to be isolated for use in DNA shuffling {in vitro and/or in vivo), cloning, sequencing, or other uses as set forth herein. 
DNA isolated from similar insertions in other organisms containing homologous operons are optionally mixed for use in 
30 family shuffling formats as described herein, in which homologous genes from different organisms (or different chromo- 
somal locations within a single species, or both). Alternatively, the transduced population is recursively transduced with 
pooled transducing phage or new transducing phage generated from the previously transduced cells. This can be carried 
out recursively to optimize the diversity of the genes prior to shuffling. 

[0347] Phage isolated from insertions in a variety of strains or organisms containing homologous operons are optionally 
35 mixed and used to coinfect cells at a high MOI allowing for recombination between incoming transducing fragments prior 
to recombination into the chromosome. 

[0348] Locked in prophage are useful for mapping of genes, operons, and/or specific mutations with either desirable 
or undesirable phenotypes. Locked-in prophages can also provide a means to separate and map multiple mutations in 
a given host. If one Is looking for beneficial mutations outside a gene or operon of interest, then an unmodified gene or 
"fo operon can be transduced into a mutagenized or shuffled host then screened for the presence of desired secondary 
mutations. Alternatively, the gene/operon of interest can be readily moved from a mutagenized/shuffled host into a 
different background to screen/select for modifications in the gene/operon itself. 

[0349] It is also possible to develop similar genetic elements using other combinations of transposable elements and 
bacteriophages or viruses as well. Similar systems are set up in other organisms, e.g.. that do not allow replication of 

45 P22 or P1. Broad host range phages and transposable elements are especially useful. Similar genetic elements are 
derived from other temperate phages that also package by a heedful mechanism. In general, these are the phages that 
are capable of generalized transduction. Viruses infecting eukaryotic cells may be adapted for similar purposes. Examples 
of generalized transducing phages that are useful are described in: Green etal., "Isolation and preliminary characterization 
of lytic and lysogenic phages with wide host range within the streptomycetes", J. Gen Microbiol 1 31 (9):2459-2465 ( 1 985); 

50 Studdard et al„ "Genome structure in Streptomyces spp;: adjacent genes on the S. coe//co/or A3{2) linkage map have 
cotransducibte analogs in S. venezt/e/ae"; J. 0acter/o/169(8):3814-3816(1987); Wang efa/., "High frequency generalized 
transduction by mIniMu plasmid phage", Genetics 116{2):201-206, (1987); Welker, N. E., "Transduction in Bacillus 
stearothermophilus'\ J. Bacteriol, 176(11 ):3354-3359, (1988); Darzins ef a/., "Mini-D3112 bacteriophage transposable 
elements for genetic analysis of Pseudomonas aeruginosa, J. Bacteriol 171(7): 3909-39 16 (1989); Hugouvieux-Cotte- 

55 Pattat et a/., "Expanded linkage map of EnA/inia chrysanthemi siraiu 3937", Mol Microbiol 3{5):573-58^, (1989); Ichige 
et al., "Establishment of gene transfer systems for and construction of the genetic map of a marine Vibrio strain", J. 
Bacter/o/ 171 (4):1 825-1 834 (1989); Muramatsu etal., "Two generalized transducing phages in Vibrio parahaemolyficus 
and Vibrio alginolyticus", Microbiol Immunol 2^^2)\^Q73'^08A (1991); Regue etal., "A generalized transducing bacte- 
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riophage for Serratia marcescens", Res Microbiol 42(^ ): 23-27, (1 991 ); Kiesel eta!., "Phage Acml-mediated transduction 
in the facultatively methanol-ulilizing>Ace/otacfe/'me/A?ano//cusMB 58/4", J. Gen \//>'o/74(9): 1741-1 745 (1993); Blahova 
etaL, "Transduction of Imipenem resistance by the phage F-116 from a nosocomial strain of Pseudomonas aeruginosa 
isolated in Slovakia", Acta V7ro/38(5):247-250 ( 1 994); Kidambi etal., "Evidence for phage-mediated gene transfer among 

5 Pseudomonas aeruginosa strains on the phylloplane", AppI Environ Microbiol 60: (2)496-500 (1994); Weiss et 
aL, "Isolation and characterization of a generalized transducing phage for Xanthomonas campestrispy. campestris'*, J. 
Bacterial 176(11):3354-3359 (1994); Matsumolo et ai, "Clustering of the trp genes in Burkhojderia (formerly Pseu- 
domonas) cepacia", FE/VfS/W/crot>/o/i.efM34(2-3):265-271 (1995); Schicklmater el a/., "Frequency of generalized trans- 
ducing phages in natural isolates of the Salmonella typhimuriumcomplex", Appl Environ Microbiol6^ (4): 61 (4): 1637-1 640 

10 (1995); Humphrey et al., "Purification and characterization of VSH-I, a generalized transducing bacteriophage of Ser- 
pulina hyodysenteriae" , J Bacterial 179(2):323-329 (1997); Willi et al., "Transduction of antibiotic resistance markers 
among Actinobacillus aciinomycetemcomitans strains by temperate bacteriophages Aa phi 23 ", Cell Mot Life Sci 53 
(11-12):904-910 (1997); Jensen et al, "Prevalence of broad-host-range lytic bacteriophages of Sphaerotilus natans, 
Escherichia coll, and Pseudomonas aeruginosa", Appl Environ Microbiol dA(2):b75-dQ0 (1998), and Nedelmann et 

t5 al., "Generalized transduction for genetic linkage analysis and transfer ol transposon insertions in different Staphyloco- 
ccus epidermidis sUa\ns" /Zentiviralalbl. Bakteriol 2S7{^'2):Sb'92 

[0350] A Mud-P1/Tn-P1. system comparable to Mud-P22 is developed using phage PI . Phage P I has an advantage 
of packaging much larger. (-110 kb) fragments per headful. Phage PI is currently used to create bacterial artificial 
chromosomes or BAC's. P1-based BAG vectors are designed along these principles so that cloned DNA is packaged 
20 into phage particles, rather than the current system, which requires DIMA preparation from single-copy episomes. This 
combines the advantages of both systems in having the genes cloned in a stable single- copy format, whilst allowing for 
amplification and specific packaging of cloned DNA upon induction of the prophage. 

W. RANDOM PLACEMENT OF GENES OR IMPROVED GENES THROUGHOUT THE GENOME FOR OPTIMIZATION 
25 OF GENE CONTEXT 

[0351] The placement and orientation of genes in a host chromosome (the "context" of the gene in a chromosome) 
or episome has large effects on gene expression and activity. Random integration of plasmid or other episomal sequences 
into a host chromosome by non-homologous recombination, followed by selection or screening forthe desired phenotype, 
30 is a preferred way of identifing optimal chromosomal positions for expression of a target. This strategy is illustrated in 
Fig. 18. 

[0352] A variety of transposon mediated delivery systems can be employed to deliver genes of interest, either individual 
genes, genomic libraries, or a library of shuffled gene(s) randomly throughout the genome of a host. Thus, in one preferred 
embodiment, the improvement of a cellular function is achieved by cloning a gene of interest, for example a gene encoding 

35 a desired metabolic pathway, within a transposon delivery vehicle. 

[0353] Such transposon vehicles are available for both Gram-negative and Gram-positive bacteria. De Lorenzo and 
Timis (1994) Methods in Enzymolog y 235:385-404 describe the analysis and construction of stable phenotypes in gram- 
negative Bacteria with Tn5- and Tn 10-derived minitransposons. Kleckner et al. (1991) Methods in Enzymology 204, 
chapter 7 describe uses of transposons such as TnIO, including for use in gram positive bacteria. Petit et al. (1990) 

^0 Journal of Bacteriology ^72(^2):S7 36-67 40 describe Tnl Ode rived transposons active in Bacillus Subtilis. The transposon 
delivery vehicle is introduced into a cell population, which is then selected for recombinant cells that have incorporated 
the transposon into the genome. 

[0354] The selection is typically by any of a variety of drug resistant markers also carried within the transposon. The 
selected subpopulation is screened for cells having improved expression of the gene(s) of interest. Once cells harboring 
"^5 the genes of interest in the optirnal location are isolated, the genes are aniplified from within the genome using PGR, 
shuffled, and cloned back into a similar transposon delivery vehicle which contains a different selection marker within 
the transposon and lacks the transposon integrase gene. 

[0355] This shuffled library is then transformed back into the strain harboring the original transposon, and the cells 
are selected for this presence of the new resistance marker and the loss of the previous selection marker. Selected cells 

50 are enriched for those that have exchanged by homologous recombination the original transposon for the new transposon 
carrying members of the shuffled library. The surviving cells are then screened for further improvements in the expression 
of the desired phenotype. The genes from the improved cells are then amplified by the PCR and shuffled again. This 
process is carried out recursively, oscillating each cycle between the different selection markers. Once the gene(s) of 
interest are optimized to a desired level, the fragment can be amplified and again randomly distributed throughout the 

55 genome as described above to identify the optimal location of the improved genes. 

[0356] Alternatively, the gene(s) conferring a desired property may not be known. In this case the DNA fragments 
cloned within the transposon delivery vehicle could be a library of genomic fragments originating from a population of 
cells derived from one or more strains having the desired property(ies). The library is delivered to a population of cells 
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derived from one or more strains having or lacking the desired property(ies) and cells incorporating the transposon are 
selected. The surviving cells are then screened for acquisition or improvement of the desired property. The fragments 
contained within the surviving cells are amplified by PGR and then cloned as a pool into a similar transposon delivery 
vector harboring a different selection marker from the first delivery vector. This library is then delivered to the pool of 
5 surviving cells, and the population having acquired the new selective rharker is selected. The selected celts are then 
screened for further acquisition or improvement of the desired property. In this way the different possible combinations 
of genes conferring or improving a desired phenolype are explored in a combinatorial fashion. This process is carried 
out repetitively with each new cycle employing an additional selection marker. Alternatively, PGR fragments are cloned 
into a pool of transposon vectors having different selective markers. These are delivered to cells and selected for 1,2,3, 
or more markers. 

[0357] Alternatively, the amplified fragments from each improved cell are shuffled independently. The shuffled libraries 
are then cloned back into a transposon delivery vehicle similar to the original vector but containing a different selection 
marker and lacking the transposase gene. Selection is then for acquisition of the new marker and loss of the previous 
marker. Selected cells are enriched for those incorporating the shuffled variants of the amplified genes by homologous 
15 recombination. This process is carried out recursively, oscillating each cycle between the two selective markers. 

X. IMPROVEMENT OF OVEREXPRESSED GENES FOR A DESIRED PHENOTYPE 

[0358] The improvement of a cellular property or phenotype is often enhanced by increasing the copy number or 
20 expression of gene{s) participating in the expression of that property. Genes that have such an effect on a desired 
property can also be improved by DNA shuffling to have a similar effect. A genomic DNA library is cloned into an 
overexpression vector and transformed into a target cell population such that the genomic fragments are highly expressed 
in cells selected for the presence of the overexpression vector. The selected cells are then screened for improvement 
of a desired property. The overexpression vector from the improved cells are isolated and the cloned genomic fragments 
25 shuffled. The genomic fragment carried in the vector from each improved isolate is shuffled independently or with 
identified homologous genes (family shuffling). The shuffled libraries are then delivered back to a population of cells and 
the selected transformants rescreened for further improvements in the desired property. This shuffling/screening process 
is cycled recursively until the desired property has been optimized to the desired level. 

[0359] As stated above, gene dosage can greatly enhance a desired cellular property. One method of increasing gene 

30 copy number of unknown genes is using a method of random amplification (see a/so, Mavingui et. al. (1997) Nature 
Biotech, 15, 564). In this method, a genomic library is cloned into a suicide vector containing a selective marker that 
also at higher dosage provides. an enhanced phenotype. An example of such a marker is the kanamycin resistance 
gene. At successively higher copy number, resistance to successively higher levels of kanamycin is achieved. The 
genomic library is delivered to a target cell by any of a variety of methods including transformation, transduction, con- 

35 jugatton. etc. Gells that have incorporated the vector into the chromosome by homologous recombination between the 
vector and chromosomal copies of the cloned genes can be selected by requiring expression of the selection marker 
under conditions where the vector does not replicate. This recombination event results in the duplication of the cloned 
DNA fragment in the host chromosome with a copy of the vector and selection marker separating the two copies. The 
population of surviving cells are screened for improvement of a desired cellular property resulting form the gene duplication 

■^o event. Further gene duplication events resulting in additional copies of the original cloned DNA fragments can be gen- 
erated by further propagating the cells under successively more stringent selective conditions i.e. increased concentra- 
tions of kanamycin. In this case selection requires increased copies of the selective marker, but increased copies of the 
desired gene fragment is also concomitant. Surviving cells are further screened for an improvement in the desired 
phenotype. The resulting population of cells likely resulted in the amplification of different genes since often many genes 

45 effect a given phenotype. To generate a library of the possible combinations of these genes, the original selected library 
showing phenotypic improvements are recombined, using the methods described herein, e.g., protoplast fusion, split 
pool transduction, transformation, conjugation, etc. 

[0360] The recombined cells are selected for increased expression of the selective marker. Survivors are enriched 
for cells having incorporated additional copies of the vector sequence by homologous recombination, and these cells 
50 will be enriched for those having combined duplications of different genes. In other words, the duplication from one cell 
of enhanced phenotype becomes combined with the duplication of another cell of enhanced phenotype. These survivors 
are screened for further improvements in the desired phenotype. This procedure is repeated recursively until the desired 
level of phenotypic expression is achieved. 

[0361] Alternatively, genes that have been identified or are suspected as being beneficial in increased copy number 
55 are cloned in tandem into appropriate plasrnid vectors. These vectors are then transformed and propagated in an 
appropriate host organism. Plasmid-plasmid recombination between the cloned gene fragments result in further dupli- 
cation of the genes. Resolution of the plasrnid doublet can result in the uneven distribution of the gene copies, with some 
plasmids having additional gene copies and others having fewer gene copies. Cells carrying this distribution of plasmids 
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are then screened for an improvement in the phenotype effected by the gene duplications. 

[0362] In summary, a method of selecting for increased copy number of a nucleic acid sequence by the above procedure 
is provided. In the method, a genomic library in a suicide vector comprising a dose-sensitive selectable marker is provided, 
as noted above. The genomic library is transduced into a population of target cells. The target cells are selected in a 
5 population of target cells for increasing doses of the selectable marker under conditions in which the suicide vector does 
not replicate episomally. A plurality of target cells are selected for the desired, phenotype, recombined and reselected. 
The process is recursively repeated, if desired, until the desired phenotype is obtained. 

Y. STRATEGIES FOR IMPROVING GENOMIC SHUFFLING VIA TRANSFORMATION OF LINEAR DNA FRAGMENTS 

[0363] Wild-type members of the Enterqbacteriaceae (e.g., Escherichia coll) are typically resistant to genetic exchange 
following transformation of linear DNA molecules. This is due, at least in part, to the Exonuclease V (Exo V) activity of 
the RecBCD holoenzyme which rapidly degrades linear DNA molecules following transformation. Production of ExoV 
has been traced to the recD gene, which encodes the D subunit of the holoenzyme. As demonstrated by Russel et al. 

^5 (1 989) Journal of Bacteriology 2609-261 3, homologous recombination between a transformed linear donor DNA molecule 
and the chromosome of recipient is readily detected in a strains bearing a toss of function mutation in a recD mutant 
The use of recD strains provides a simple means for genomic shuffling of the Enterobacteriaceae. For example, a 
bacterial strain or set of related strains bearing a recD null mutation (e.g.. the E. coli recD7903::mini-Tet allele) is 
mutagenized and screened for the desired properties. In a split-pool fashion, Chromosomal DNA prepared on one aliquot 

20 could be used to transform (e.g., via electroporation or chemically induced competence) the second aliquot. The resulting 
transformants are then screened for improvement, or recursively transformed prior to screening. 
[0364] The use of RecE/ recT as described supra, can improve homologous recombination of linear DNA fragments. 
[0365] The RecBCD holoezyme plays an important role in initiation of RecA-dependent homologous recombination. 
Upon recognizing a dsDNA end, the RecBCD enzyme unwinds and degrades the DNA asymmetrically in a 5* to 3* 

25 direction until it encounters a chi (or "X")-site (consensus 5'-GCTGGTGG-3') which attenuates the nuclease activity. 
This results in the generation of a ssDNA terminating near the c site with a 3'-ssDNA tail that is preferred for RecA 
loading and subsequent invasion of dsDNA for homologous recombination. Accordingly, preprocessing of transforming 
fragments with a 5' to 3' specific ssDNA Exonuclease, such as Lamda (X) exonuclease (available, e.g., from Boeringer 
Mannheim) prior to transformation may serve to stimulate homologous recombination in recO"strain by providing ssDNA 

30 invasive end for RecA loading and subsequent strand invasion. 

[0366] The addition of DNA sequence encoding chi-sites (consensus 5'-GCTGGTGG-3') to DNA fragments can serve 
to both attenuate Exonuclease V activity and stimulate homologous recombination, thereby obviating the need for a 
recO mutation (see also, Kowalczykowski, etal. (1994) "Biochemistry of homologous recombination in Esc/7encft/aco//,'* 
Microbiol. Rev. 58:401-465 and Jessen, et al. (1998) "Modification of bacterial artificial chromosomes through Chi- 

35 stimulated homologous recombination and its application in zebrafish transgenesis." Proc, Natl. Acad. Sci. 95: 
5121-5126). 

[0367] Chi sites are optionally included in linkers ligated to the ends of transforming fragments or incorporated into 
the external primers used to generate DNA fragments to be transformed . The use of recombination-stimulatory sequences 
such as chi is a generally useful approach for evolution of a broad range of cell types by fragment transformation. 
40 [0368] Methods to inhibit or mutate analogs of Exo V or other nucleases (such as, Exonucleases I (endA1), III (nf/7), 
IV (n/b), VII, and VIII of E coli) is similarly useful. Inhibition or elimination of nucleases, or modification of ends of 
transforming DNA fragments to render them resistant to exonuclease activity has applications in evolution of a broad 
range of cell types. 

45 Z. SHUFFLING TO OPTIMIZE UNKNOWN INTERACTIONS 

[0369] Many observed traits are the result of complex interactions of multiple genes or gene products. Most such 
interactions are still uncharacterized. Accordingly, it is often unclear which genes need to be optimized to achieve a 
desired trait, even if some of the genes contributing to the trait are known. 

50 [0370] This lack of characterization is not an issue during DNA shuffiing, which produces solutions that optimize 
whatever is selected for. An alternative approach, which has the potential to solve not only this problem, but also 
anticipated future rate limiting factors, is complementation by overexpression of unknown genomic sequences. 
[0371] A library of genomic DNA is first made as described, supra. This is transformed into the cell to be optimized 
and transformants are screened for increases in a desired property. Genomic fragments which result in an improved 

55 property are evolved by DNA shuffling to further increase their beneficial effect. This approach requires no sequence 
information, nor any knowledge "or assumptions about the nature of protein or pathway interactions, or even of what 
steps are rate -limiting; it relies only on detection of the desired phenotype. This sort of random cloning and subsequent 
evolution by DNA shuffling of positively interacting genomic sequences is extremely powerful and generic. A variety of 
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sources of genomic DNA are used, from isogenic strains to more distantly related species with potentially desirable 
properties. In addition, the technique is applicable to any cell for which the molecular biology basics of transformation 
and cloning vectors are available, and for any property which can be assayed (preferably in a high-throughput fomnat). 
Alternatively, once optimized, the evolved DNA can be returned to the chromosome by homologous recombination or 
5 randomly by phage mediated site-specific recombination. 

AA. HOMOLOGOUS RECOMBINATION WITHIN THE CHROMOSOME 

[0372] Homologous recombination within the chromosome is used to circumvent the limitations of plasmid based 
10 evolution and size restrictions. The strategy is similar to that described above for shuffling genes within their chromosomal 
context, except that no in vitro shuffling occurs. Instead, the parent strain is treated with mutagens such as ultraviolet 
light or nitrosoguanidine, and improved mutants are selected. The improved mutants are pooled and split. Half of the 
pool is used to generate random genomic fragments for cloning into a homologous recombination vector. Additional 
genomic fragments are optionally derived from related species with desirable properties. The cloned genomic fragments 
15 are homologously recombined into the genomes of the remaining half of the mutant pool, and variants with improved 
properties are selected. These are subjected to a further round of mutagenesis, selection and recombination. Again this 
process is entirely generic for the improvement of any whole cell biocatalyst for which a recombination vector and an 
assay can be developed. Here again, it should be noted that recombination can be performed recursively prior to 
screening. 

20 

BB. METHODS FOR RECURSIVE SEQUENCE RECOMBINATION 

[0373] Some formats and examples for recursive sequence recombination, sometimes referred to as DNA shuffling 
or molecular breeding, have been described by the present inventors and co-workers in copending application, attorney 

25 docket no. 1 6528A-01 461 2 , filed March 25, 1 996. PCT/US95/02 1 26 filed February 1 7, 1 995 (published as WO 95/22625); 
Stemmer. Science 270. 1510 (1995); Stemmer et al.. Gene, 164. 49-53 (1995); Stemmer, Bio/ Technology, 13, 549-553 
(1995): Stemmer, Proc. Natl. Acad. Set. US>4 91, 10747-10751 (1994); Stemmer, A/afure 370. 389-391 (1994); Crameri 
et al., Nature Medicine. 2(1):1-3, (1996), and Crameri et al.. Nature Biotechnology 14, 315-319 (1996) (each of which 
is incorporated by reference in its entirety for all purposes). 

30 [0374] As shown in Figs. 16 and 17, DNA Shuffling provides most rapid technology for evolution of complex new 
functions. As shown in Fig 16, panel (A), recombination in DNA shuffling achieves accumulation of multiple beneficial 
mutations in a few cycles. In contrast, because of the high frequency of deleterious mutations relative to beneficial ones, 
iterative point mutation must build beneficial mutations one at a time, and consequently requires many cycles to reach 
the same point. As shown in Fig. 16 panel B, rather than a simple linear sequence of mutation accumulation, DNA 

35 shuffling is a parallel process where multiple problems may be solved independently, and then combined. 

1 . In Vitro Formats 

[0375] One format for shuffling in vitro is illustrated in Fig. 1. The initial substrates for recombination are a pool of 
^0 related sequences. The X's in Fig. 1 , panel A, show where the sequences diverge. The sequences can be DNA or RNA 
and can be of various lengths depending on the size of the gene or DNA fragment to be recombined or reassembled. 
Preferably the sequences are from 50 bp to 50 kb. 

[0376] The pool of related substrates are converted into overlapping fragments, e.g., from about 5 bp to 5 kb or more, 
as shown In Fig. 1 , panel B. Often, the size of the fragments is from about 10 bp to 1000 bp, and sometimes the size of 

45 the DNA fragments is from about 100 bp to 500 bp. The conversion can be effected by a number of different methods, 
such as DNasel or RNase digestion, random shearing Or partial restriction enzyme digestion. Alternatively, the conversion 
of substrates to fragments can be effected by incomplete PCR amplification of substrates or PCR primed from a single 
primer. Alternatively, appropriate single-stranded fragments can be generated on a nucleic acid synthesizer. The con- 
centration of nucleic acid fragments of a particular length and sequence is often less than 0.1 % or 1% by weight of the 

50 total nucleic acid. The number of different specific nucleic acid fragments in the mixture is usually at least about 100, 
500 or 1000. 

[0377] The mixed population of nucleic acid fragments are converted to at least partially single-stranded form. Con- 
version can be effected by heating to about 80 °C to 100 X, more preferably from 90 *C to 96 "C, to form single- stranded 
nucleic acid fragments and then reannealing. Conversion can also be effected by treatment with single-stranded DNA 
55 binding protein or recA protein. Single-stranded nucleic acid fragments having regions of sequence identity with other 
single-stranded nucleic acid fragments can then be reannealed by cooling to 4X to 75*'C, and preferably from 40 "C to 
65 "C. Renaturation can be accelerated by the addition of polyethylene glycol (PEG), other volume-excluding reagents 
or salt. The salt concentration is preferably from 0 mM to 200 mM, more preferably the salt concentration is from 10 mM 
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to 100 mM. The salt may be KCI or NaCI. The concentration of PEG is preferably from 0% to 20%, more preferably from 
5% to 10%. The fragments that reanneal can be from different substrates as shown in Fig. 1, panel C. The annealed 
nucleic acid fragments are incubated in the presence of a nucleic acid polymerase , such as Taq or Klenow, or proofreading 
polymerases, such as pfu or pwo, and dNTP's (i.e. dATP, dCTP. dGTP and dTTP). If regions of sequence identity are 
5 large, Taq polymerase can be used with an annealing temperature of between 45-65''C. If the areas of identity are small, 
Klenow polymerase can be used with an annealing temperature of between 20-30X (Stemmer, Proc. Natl. Acad. Sci. 
USA (1994), supra). The polymerase can be added to the random nucleic acid fragments prior to annealing, simultane- 
ously with annealing or after annealing. 

[0378] The process of denaturation, renaturation and incubation in the presence of polymerase of overlapping frag- 
fo ments to generate a collection of polynucleotides containing different permutations of fragments is sometimes referred 
to as shuffling of the nucleic acid in vitro. This cycle is repeated for a desired number of times. Preferably the cycle is 
repeated from 2 to 100 times, more preferably the sequence is repeated from 10 to 40 times. The resulting nucleic acids 
are a family of double-stranded polynucleotides of from about 50 bp to about 100 kb, preferably from 500 bp to 50 kb, 
as shown in Fig. 1 , panel D, The population represents variants of the starting substrates showing substantial sequence 
15 identity thereto but also diverging at several positions. The population has many more members than the starting sub- 
strates. The population of fragments resulting from shuffling is used to transform host cells, optionally after cloning into 
a vector. 

[0379] In a variation of in vitro shuffling, subsequences of recombination substrates can be generated by amplifying 
the full-length sequences under conditions which produce a substantial fraction, typically at least 20 percent or more, 

20 of incompletely extended amplification products. The amplification products, including the incompletely extended am- 
plification products are denatured and subjected to at least one additional cycle of reannealing and amplification. This 
variation, in which at least one cycle of reannealing and amplification provides a substantial fraction of incompletely 
extended products, is termed "stuttering." In the subsequent amplification round, the incompletely extended products 
reanneal to and prime extension on different sequence-related template species. 

25 [0380] In a further variation, a mixture of fragments is spiked with one or more oligonucleotides. The oligonucleotides 
can be designed to include precharacterized mutations of a wildtype sequence, or sites of natural variations between 
individuals or species. The oligonucleotides also include sufficient sequence or structural homology flanking such mu- 
tations or variations to allow annealing with the wildtype fragments. Some oligonucleotides may be random sequences. 
Annealing temperatures can be adjusted depending on the length of homology. 

30 [0381] In a further variation, recombination occurs in at least one cycle by template switching, such as when a DNA 
fragment derived from one template primes on the homologous position of a. related but different template. Template 
switching can be induced by addition of recA. rad51 , rad55, rad57 or other polymerases (e.g., viral polymerases, reverse 
transcriptase) to the amplification mixture. Template switching can also be increased by increasing the DNA template 
concentration. 

35 [0382] In a further variation, at least one cycle of amplification can be conducted using a collection of overlapping 
single-stranded DNA fragments of related sequence, and different lengths. Fragments can be prepared using a single 
stranded DNA phage, such as Ml 3. Each fragment can hybridize to and prime polynucleotide chain extension of a 
second fragment from the collection, thus forming sequence-recombined polynucleotides. In a further variation, ssDNA 
fragments of variable length can be generated from a single primer by Vent or other DNA polymerase on a first DNA 

^0 template. The single stranded DNA fj-agments are used as primers for a second, Kunkel-type template, consisting of a 
uracil-containing circular ssDNA. This results in multiple substitutions of the first template into the second. See Levichkin 
et al., Mol. Biology 29, 572-577 (1995). 

2. In \//Vo Formats 

45 

(a). Plasmid-Plasmid Recombination 

[0383] The initial substrates for recombination are a collection of polynucleotides comprising variant forms of a gene. 
The variant forms often show substantial sequence identity to each other sufficient to allow homologous recombination 

50 between substrates. The diversity between the polynucleotides can be natural (e^g., allelic or species variants), induced 
(e.g., error-prone PGR), or the result of in vitro recombination. Diversity can also result from resynthesizing genes 
encoding natural proteins with alternative and/or mixed codon usage. There should be at least sufficient diversity between 
substriates that recombination can generate more diverse products than there are starting materials. There must be at 
least two substrates differing in at least two positions. However, commonly a library of substrates of 10^-10^ members 

55 is employed. The degree of diversity depends on the length of the substrate being recombined and the extent of the 
functional change to be evolved. Diversity at between 0.1-50% of positions is typical. The diverse substrates are incor- 
porated into plasmids. The plasmids are often standard cloning vectors, e.g., bacterial multicopy plasmids. However, in 
some methods to be described below, the plasmids include mobilization functions. The substrates can be incorporated 
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into the same or different plasmids. Often at least two different types of ptasmid having different types of selection marker 
are used to allow selection for cells containing at least two types of vector. Also, where different types of plasmid are 
employed, the different plasmids can come from two distinct incompatibility groups to allow stable co-existence of two 
different ptasnriids within the cell. Nevertheless, plasmids from the same incompatibility group can still co-exist within 

5 the same cell for sufficient time to allow homologous recombination to occur. 

[0384] Plasmids containing diverse substrates are initially introduced into prokaryotic or eukaryotic cells by any trans- 
fection methods (e.g.. chemical transformation, natural competence, electroporation, viral transduction or blolistics). 
Often, the plasmids are present at or near saturating concentration (with respect to maximum transfection capacity) to 
increase the probability of more than one plasmid entering the same cell. The plasmids containing the various substrates 

10 can be transfected simultaneously or in multiple rounds. For example, in the latter approach cells can be transfected 
with a first aliquot of ptasmid, transfectants selected and propagated, and then infected with a second aliquot of plasmid. 
[0385] Having introduced the plasmids into cells, recombination between substrates to generate recombinant genes 
occurs within cells containing multiple different plasmids merely by propagating In the cells. However, cells that receive 
only one plasmid are unable to participate in recombination and the potential contribution of substrates on such plasmids 

15 to evolution is not fully exploited (although these plasmids may contribute to some extent if they are propagated in 
mutator cells or otherwise accumulate point mutations (i.e., by ultraviolet radiation treatment). The rate of evolution can 
be increased by allowing all substrates to participate in recombination. Such can be achieved by subjecting transfected 
cells to electroporation. The conditions for electroporation are the same as those conventionally used for Introducing 
exogenous DNA into cells (e.g., 1,000-2,500 volts, 400 \if and a 1-2 mM gap). Under these conditions, plasmids are 

20 exchanged between cells allowing all substrates to participate in recombination. In addition the products of recombination 
can undergo further rounds of recombination with each other or with the original substrate. The rate of evolution can 
also be increased by use of conjugative transfer. Conjugative transfer systems are known in many bacteria (E. coli, P. 
aeruginosa, S. pneumoniae, and H. influenzae) and can also be used to transfer DNA between bacteria and yeast or 
between bacteria and mammalian cells. 

25 [0386] To exploit conjugative transfer, substrates are cloned into plasmids having MOB genes, and tra genes are also 
provided in cis or in trans to the MOB genes. The effect of conjugative transfer is very similar to electroporation in that 
it allows plasmids to move between cells and allows recombination between any substrate and the products of previous 
recombination to occur merely by propagating the culture. The details of how conjugative transfer is exploited in these 
vectors are discussed in more detail below. The rate of evolution can also be increased by fusing protoplasts of cells to 

30 induce exchange of plasmids or chromosomes. Fusion can be induced by chemical agents, such as PEG, or viruses or 
viral proteins, such as influenza virus hemagglutinin, HSV-1 gB and gD. The rate of evolution can also be increased by 
use of mutator host cells (e.g., Mut L, S, D, T, H and Ataxia telangiectasia human cell lines). 

[0387] Alternatively, plasmids can be propagated together to encourage recombination, then isolated, pooled, and 
reintroduced into cells. The combination of plasmids is different in each cell and recombination further increases the 
35 sequence diversity within the population. This is optionally carried out recursively until the desired level of diversity is 
achieved. The population is then screened and selected and this process optionally repeated with any selected cells/ 
plasmids. 

[0388] The time for which cells are propagated and recombination is allowed to occur, of course, varies with the cell 
type but is generally not critical, because even a small degree of recombination can substantially increase diversity 
^0 relative to the starting materials. Cells bearing plasmids containing recombined genes are subject to screening or selection 
for a desired function. For example, if the substrate being evolved contains a drug resistance gene, one selects for drug 
resistance. Cells surviving screening or selection can be subjected to one or more rounds of screening/selection followed 
by recombination or can be subjected directly to an additional round of recombination. 

[0389] The next round of recombination can be achieved by several different formats independently of the previous 
45 round. For example, a further round of recombination can be effected simply by resuming the electroporation or conju- 
gation-mediated intercellular transfer of plasmids described above. Alternatively, a fresh substrate or substrates, the 
same or different from previous substrates, can be transfected into cells surviving selection/screening. Optionally, the 
new substrates are included in plasmid vectors bearing a different selective marker and/or from a different incompatibility 
group than the original plasmids. As a further alternative, cells surviving selection/screening can be subdivided into two 
50 subpopulations, and plasmid DNA from one subpopulation transfected into the other, where the substrates from the 
plasmids from the two subpopulations undergo a further round of recombination. In either of the latter two options, the 
rate of evolution can be increased by employing DNA extraction, electroporation, conjugation or mutator cells, as de- 
scribed above. In a still further variation, DNA from cells surviving screening/selection can be extracted and subjected 
to /n wfro DNA shuffling. 

55 [0390] After the second round of recombination, a second round of screening/selection is performed, preferably under 
conditions of increased stringency. If desired, further rounds of recombination and selection/screening can be performed 
using the same strategy as for the second round. With successive rounds of recombination and selection/screening, the 
surviving recombined substrates evolve toward acquisition of a desired pheriotype. Typically, in this and other methods 
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of recursive recombination, the final product of recombination that has acquired the desired phenotype differs from 
starting substrates at 0.1%-25% of positions and has evolved at a rate orders of magnitude in excess (e.g., by at least 
1 0-fold, 1 00-fold. 1 000-fold , or 1 0,000 fold) of the rate of naturally acquired mutation of about 1 mutation per 1 0'^ positions 
per generation (see Anderson & Hughes, Proc. Natl. Acad. Sci. USA 93, 906-907 (1996)). As with other techniques 
5 herein, recombination steps can be performed recursively to enhance diversity prior to screening. In addition, the entire 
process can be performed in a recursive manner to generate desired organisms^ clones or nucleic acids. 

3. Virus-Plasmid Recombination 

10 [0391] The strategy used for plasmid-plasmid recombination can also be used for virus-plasmid recombination; usually, 
phage-plasmid recombination. However, some additional comments particular to the use of viruses are appropriate. The 
initial substrates for recombination are cloned into both plasmid and viral vectors. It is usually not critical which substrate 
(s) are inserted into the viral vector and which into the plasmid, although usually the viral vector should contain different 
substrate(s) from the plasmid. As before, the plasmid (and the virus) typically contains a selective marker. The plasmid 

15 and viral vectors can both be introduced into cells by transfection as described above. However, a more efficient procedure 
is to transform the cells with plasmid, select transformants and infect the transformants with a virus. Because the efficiency 
of infection of many viruses approaches 100% of cells, most cells transformed and infected by this route contain both 
a plasmid and virus bearing different substrates. 

[0392] Homologous recombination occurs between plasmid and virus generating both recombined plasmids and re- 

20 combined virus. For some viruses, such as filamentous phage, in which intracellular DNA exists in both double-stranded 
and single-stranded forms, both can participate in recombination. Provided that the virus is not one that rapidly kills cells, 
recombination can be augmented by use of electroporation or conjugation to transfer plasmids between cells. Recom- 
bination can also be augmented for some types of virus by allowing the progeny virus from one cell to reinfect other 
cells. For some types of virus, virus infected-cells show resistance to superinfection. However, such resistance can be 

25 overcome by infecting at high multiplicity and/or using mutant strains of the virus in which resistance to superinfection 
is reduced. Recursive infection and transformation prior to screening can be performed to enhance diversity. 
[0393] The result of infecting plasmid-containing cells with virus depends on the nature of the virus. Some viruses, 
such as filamentous phage, stably exist with a plasmid in the cell and also extrude progeny phage from the cell. Other 
viruses, such as lambda having a cosmid genome, stably exist in a cell like plasmids without producing progeny virions. 

30 Other viruses, such as the T-phage and lytic lambda, undergo recombination with the plasmid but ultimately kill the host 
cell and destroy plasmid DNA. For viruses that infect cells without killing the host, cells containing recombinant plasmids 
and virus can be screened/selected using the same approach as for plasniid-plasmid recombination. Progeny virus 
extruded by cells surviving selection/screening can also be collected and used as substrates in subsequent rounds of 
recombination. For viruses that kill their host cells, recombinant genes resulting from recombination reside only in the 

35 progeny virus. If the screening or selective assay requires expression of recombinant genes in a cell, the recombinant 
genes should be transferred from the progeny virus to another vector, e.g., a plasmid vector, and retransfected into cells 
before selection/screening is performed. 

[0394] For filamentous phage, the products of recombination are present in both cells surviving recombination and in 
phage extruded from these cells. The dual source of recombinant products provides some additional options relative to 
"fo the plasmid-plasmid recombination. For example, DNA can be isolated from phage particles for use in a round of in vitro 
recombination. Alternatively, the progeny phage can be used to transfect or infect cells surviving a previous round of 
screening/selection, or fresh cells transfected with fresh substrates for recombination. 

4. Virus-Virus Recombination 

45 . 

[0395] The principles described for plasmid-plasmid and plasmid-viral reconribination can be applied to virus-virus 
recombination with a few. modifications. The initial substrates for recombination are cloned into a viral vector. Usually, 
the same vector is used for all substrates. Preferably, the virus is one that, naturally or as a result of mutation, does not 
kill cells. After insertion, some viral genomes can be packaged in vitro. The packaged viruses are used to infect cells at 

50 high multiplicity such that there is a high probability that a cell receives multiple viruses bearing different substrates. 
[0396] After the initial round of infection, subsequent steps depend on the nature of infection as discussed in the 
previous section. For example, if the viruses have phagemid genomes such as lambda cosmids or M13, F1 or Fd 
phagemids, the phagemids behave as plasmids within the cell and undergo recombination simply by propagating in the 
cells. Recombination and sequence diversity can be enhanced by electroporafion of cells. Following selection/screening, 

55 cosmids containing recombinant genes can be recovered from surviving cells (e.g., by heat induction of a cos lysogenic 
host cell), repackaged in vitro, and used to infect fresh cells at high multiplicity for a further round of recombination. 
[0397] If the viruses are filamentous phage, recombination of replicating form DNA occurs by propagating the culture 
of infected cells. Selection/screening identifies colonies of cells containing viral vectors having recombinant genes with 
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improved properties, together with phage extruded from such cells. Subsequent options are essentially the same as for 
plasmid-viral recombination. 

5. Chromosome-Plasmid Recombination 

5 

[0398] This format can be used to evolve both the chromosomal and plasmid-borne substrates. The format is particularly 
useful in situations in which many chromosomal genes contribute to a phenotype or one does not know the exact location 
of the chromosomal gene(s) to be evolved. The initial substrates for recombination are cloned into a plasmid vector. If 
the chromosomal gene(s) to be evolved are known, the substrates constitute a family of sequences showing a high 
10 degree of sequence identity but some divergence from the chromosomal gene. If the chromosomal genes to be evolved 
have not been located, the initial. substrates usually constitute a library of DNA segments of which only a small number 
show sequence Identity to the gene or gene(s) to t)e evolved. Divergence between plasmid-borne substrate and the 
chromosomal gene(s) can be induced by mutagenesis or by obtaining the plasmid-borne substrates from a different 
species than that of the cells bearing the chromosome. 
15 [0399] The plasmids bearing substrates for recombination are transfected into cells having chromosomal gene(s) to 
be evolved. Evolution can occur simply by propagating the culture, and can be accelerated t^y transferring plasmids 
between cells by conjugation, electroporation or protoplast fusion. Evolution can be further accelerated by use of mutator 
host cells or by seeding a culture of nonmutator host cells being evolved with mutator host cells and inducing intercellular 
transfer of plasmids by electroporation, conjugation or protoplast fusion. Alternatively, recursive isolation and transfer- 
ee mation can be used. Preferably, mutator host cells used for seeding contain a negative selection marker to facilitate 
isolation of a pure culture of the nonmutator cells being evolved. Selection/screening identifies cells bearing chromosomes 
and/or plasmids that have evolved toward acquisition of a desired function. 

[0400] Subsequent rounds of recombination and selection/screening proceed in similar fashion to those described for 
plasmid-plasmid recombination. For example, further recombination can be effected by propagating cells surviving 

25 recombination in combination with electroporation, conjugative transfer of plasmids, or protoplast fusion. Alternatively, 
plasmids bearing additional substrates for recombination can be introduced into the surviving cells. Preferably, such 
plasmids are from a different incompatibility group and bear a different selective marker than the original plasmids to 
allow selection for cells containing at least two different plasmids. As a further alternative, plasmid and/or chromosomal 
DNA can be isolated from a subpopulation of surviving cells and transfected into a second subpopulation. Chromosomal 

30 DNA can be cloned into a plasmid vector before transfection. 

6. Virus-Chromosome Recombination 

[0401] As in the other methods described above, the virus is usually one that does not kill the celts, and is often a 
35 phage or phagemid. The procedure is substantially the same as for plasmid-chromosome recombination. Substrates 
for recombination are cloned into the vector. Vectors including the substrates can then be transfected into cells or in 
vitro packaged and introduced into cells by infection. Viral genomes recombine with host chromosomes merely by 
propagating a culture. Evolution can be accelerated by allowing intercellular transfer of viral genomes by electroporation, 
or reinfection of cells by progeny virions. Screening/selection identifies cells having chromosomes and/or viral genomes 
40 that have evolved toward acquisition of a desired function. 

[0402] There are several options for subsequent rounds of recombination. For example, viral genomes can be trans- 
ferred between cells surviving selection/recombination by recursive isolation and transfection and electroporation. Al- 
ternatively, viruses extruded from cells surviving selection/screening can be pooled and used to.superinfect the ceils at 
high multiplicity. Alternatively, fresh substrates for recombination can be introduced into the cells, either on plasmid or 
45 viral vectors. 

CC. POOLWISE WHOLE GENOME RECOMBINATION 

[0403] Asexual evolution is a slow and inefficient process. Populations move as individuals rather than as a group. A 
50 diverse population is generated by mutagenesis of a single parent, resulting in a distribution of fit and unfit individuals. 
In the absence of a sexual cycle, each piece of genetic information for the surviving population remains in the individual 
mutants. Selection of the fittest results in many fit individuals being discarded, along with the genetically useful infomnation 
they carry. Asexual evolution proceeds one genetic event at a time, and is thus limited by the intrinsic value of a single 
genetic event. Sexual evolution moves more quickly and efficiently. Mating within a population consolidates genetic 
55 Information within the population and results in useful information being combined together. The combining of useful 
genetic information results in progeny that are much more fit than their parents. Sexual evolution thus proceeds much 
faster by multiple genetic events. These differences are further illustrated In Fig. 1 7. In contrast to sexual evolution, DNA 
shuffling is the recursive mutagenesis, recombination, and selection of DNA sequences (see also. Fig. 25.). 
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[0404] Sexual recombination in nature effects pairwtse recombination and results in progeny that are genetic hybrids 
of two parents. In contrast, DNA shuffling in wfro effects poolwise recombination, in which progeny are hybrids of multiple 
parental molecules. This is because DNA shuffling effects many individual pairwise recombination events with each 
thermal cycle.. After many cycles the result is a repetitively inbred population, with the "progeny" being the Fx (for X 
5 cycles of reassembly) of the original parental molecules. These progeny are potentially descendants of many or all of 
the original parents. The graph shown in Fig. 25 shows a plot of the potential number of mutations an individual can 
accumulate by sequential, pairwise and poolwise recombination. 

[0405] Poolwise recombination is an important feature to DNA shuffling in that it provides a means of generating a 
greater proportion of the possible combinations of mutations from a single "breeding" experiment. In this way, the "genetic 

10 potential" of a population can be readily assessed by screening the progeny of a single DNA shuffling experiment. 

[0406] For example, if a population consists of 10 single mutant parents, there are 2^^ = 1024 possible combinations 
of those mutations ranging from progeny having 0-10 mutations. Of these 1024, only 56 wilt result from a single painwise 
cross (Fig. 14) (i.e those having 0, 1 , and 2 mutations). In nature the multiparent combinations will eventually arise after 
multiple random sexual matings, assuming no selection is imparted to remove some mutations from the population. In 

15 this way, sex effects the consolidation and sampling of all useful mutant combinations possible within a population. For 
the purposes of directed evolution, having the greatest number of mutant combinations entering a screen or selection 
is desirable so that the best progeny (i.e., according to the selection criteria used in the selection screen) is identified in 
the shortest possible time. 

[0407] One challenge to in vivo and whole genome shuffling is devising methods for effecting poolwise recombination 
20 or multiple repetitive pairwise recombination events. In crosses with a single painwise cross per cycle before screening, 
the ability to screen the "genetic potential" of the starting population is limited. For this reason, the rate of in vivo and 
whole genome shuffling mediated cellular evolution would be facilitated by effecting poolwise recombination. Two strat- 
egies for poolwise recombination are described below (protoplast fusion and transduction). 

25 1. Protoplast Fusion: 

[0408] Protoplast fusion (discussed supra) mediated whole genome shuffling (WGS) is one format that can directly 
effect poolwise recombination. Whole gene shuffling is the recursive recombination of whole genomes, in the form of 
one or more nucleic acid molecule(s) (fragments, chromosomes, episomes, etc), from a population of organisms, resulting 
30 in the production of new organisms having distributed genetic information from at least two of the starting population of 
organisms. The process of protoplast fusion is further illustrated in Fig. 26. 

[0409] Progeny resulting from the fusion of multiple parent protoplasts have been observed (Hopwood & Wright, 1978), 
however, these progeny are rare (10-^-10-®). The low frequency is attributed to the distribution of fusants arising from 
two, three, four, etc parents and the likelihood of the multiple recombination events (6 crossovers for a four parent cross) 
35 that would have to occur for multiparent progeny to arise. Thus, it is useful to enrich for the multiparent progeny. This 
can be accomplished, e.g., by repetitive fusion or enrichment for multiply fused protoplasts. The process of poolwise 
fusion and recombination is further illustrated in Fig. 27. 

2. Repetitive Fusion: 

40 

[0410] Protoplasts of identified parental cells are prepared, fused and regenerated. Protoplasts of the regenerated 
progeny are then, without screening or enrichment, formed, fused and regenerated. This can be carried out for two, 
three, or more cycles before screening to increase the representation of multiparent progeny. The number of possible 
mutations/progeny doubles for each cycle. For example, if one cross produces predominantly progeny with 0, 1, and 2 

45 mutations, a breeding of this population with itself will produce progeny with 0, 1, 2, 3, and 4 mutations (Fig. 15), the 
third cross up to eight, etc. The representation of the multiparent progeny from these subsequent crosses will not be as 
high as the single and double parent progeny, but it will be detectable and much higher than from a single cross. The 
repetitive fusion prior to screening is analogous to many sexual crosses within a population, and the individual themrial 
cycles of in vitro DNA shuffling described supra. A factor effecting the value of this approach is the starting size of the 

50 parental population. As the population grows, it becomes more likely that a multiparent fusion will arise from repetitive 
fusions. For example, if 4 parents are fused twice, the 4 parent progeny will make up approximately 0.2% of the total 
progeny. This is sufficient to find in a population of 3000 (95% confidence), but better representation is preferable. If ten 
parents are fused twice >20% of the progeny will be four parent offspring. 

55 3. Enrichment for multiply fused protoplasts: 

[0411] After the fusion of a population of protoplasts, the fusants are typically diluted into hypotonic medium, to dilute 
out the fusing agent (e.g., 50% PEG). The fused cells can be grown for a short period to regenerate cell walls or separated 
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directly and are then separated on the basis of size. This is carried out, e.g., by cell sorting, using light dispersion as an 
estimate of size, to isolate the largest fusants. Alternatively the fusants can be sorted by FACS on the basis of DNA 
content. The large fusants or those containing more DNA result from the fusion of multiple parents and are more likely 
to segregate to multiparent progeny. The enriched fusants are regenerated and screened directly or the progeny are 
5 fused recursively as above to further enrich the population for diverse mutant combinations. 

4. Transduction: 

[0412] Transduction can theoretically effect poolwise recombination, if the transducing phage particles contain pre- 
10 dominantly host genomic DNA rather than phage DNA. If phage DNA is overly represented, then.most celts will receive 
at least one undesired phage genome. Phage particles generated from locked-in-prophage {supra) are useful for this 
purpose. A population of cells is infected with an appropriate transducing phage, and the lysate is collected and used 
to infect the same starting population. A high multiplicity of infection is employed to deliver multiple genomic fragments 
to each infected cell, thereby increasing the chance of producing recombinants containing mutations from more than 
15 two parent genomes. The resulting transductants are recovered under conditions where phage can not propagate e.g., 
in the presence of citrate. This population is then screened directly or infected again with phage, with the resulting 
transducing particles being used to transduce the First progeny. This would mimic recursive protoplast fusion, multiple 
sexual recombination, and in vitro DNA shuffling. 

20 DO. METHODS FOR WHOLE GENOME SHUFFLING BY BLIND FAMILY SHUFFLING OF PARSED GENOMES AND 
RECURSIVE CYCLES OF FORCED INTEGRATION AND EXCISION BY HOMOLOGOUS RECOMBINATION, AND 
SCREENING FOR IMPROVED PHENOTYPES. 

[0413] In vitro methods have been developed to shuffle single genes and operons, as set forth, e.g., herein. "Family" 
25 shuffling of homologous genes within species and from different species is also an effective methods for accelerating 
molecular evolution. This section describes additional methods for extending these methods such that they can be 
applied to whole genomes. 

[0414] In some cases, the genes that encode rate limiting steps in a biochemical process, or that contribute to a 
phenotype of interest are known. This method can be used to target family shuffled libraries to such loci, generating 
30 libraries of organisms with high quality family shuffled libraries of alleles at the locus of interest. An example of such a 
gene would be the evolution of a host chaperonin to more efficiently chaperone the folding of an overexpressed protein 
in E. coli. 

[0415] The goats of this process are to shuffle homologous genes from two or more species and to then integrate the 
shuffled genes into the chromosome of a target organism. Integration of multiple shuffled genes at multiple loci can be 

35 achieved using recursive cycles of integration (generating duplications), excision (leaving the improved allele in the 
chromosome) and transfer of additional evolved genes by serially applying the same procedure. 
[0416] In the first step, genes to be shuffled into suitable bacterial vectors are subcloned. These vectors can be 
plasmids, cosmids. BAGS or the like. Thus, fragments from 100 bp to 100 kb can be handled. Homologous fragments 
are then "family shuffled" together (i.e. homologous fragments from different species or chromosomal locations are 

40 homologously recombined). As a simple case, homologs from two species (say, E. cbli and Salmonella) are cloned, 
family shuffled in vitro and cloned into an allele replacement vector (e.g., a vector with a positively selectable marker, a 
negatively selectable marker and conditionally active origin of replication). The basic strategy for whole genome family 
shuffling of parsed (subcloned) genomes is additionally set forth in Fig. 22. 

[0417] The vectors are transfected into E. coli and selected, e.g., for drug resistance. Most drug resistant cells should 
■^5 arise by homologous recombination between a family shuffled insert and a chromosomal copy of the cloned insert. 
Colonies with improved phenotype are screened (e.g., by mass spectroscopy for enzyme activity or small molecule 
production, or a chromogenic screen, or the like, depending on the phenotype to be assayed). Negative selection (i.e. 
sue selection) is imposed to force excision of tandem duplication. Roughly half of the colonies should retain the improved 
phenotype. Importantly, this process regenerates a 'clean' chromosome in which the wild type locus is replaced with a 
50 family shuffled fragment that encodes a beneficial allele. Since the chromosome is "clean" (i.e., has no vector sequences), 
other improved alleles can also be moved into this point on the chromosome by homologous recombination. 
[0418] Selection or screening for improved phenotype can occur either after step 3 or step 4 in Figure 22. If selection 
or screening takes place after step 3, then the improved allele can be conveniently moved to other strains by, for example, 
P1 transduction. One can then regenerate a strain containing the improved allele but lacking vector sequences by 
55 "negative selection" against the sue marker. In subsequent rounds, independently identified improved variants of the 
genecanbe sequentially moved into the improved strain (e.g., by P 1 transduction of the drug marked tandem duplication 
above). Transductants are screened for further improvement in phenotype by virtue of receiving the transduced tandem 
duplication, which itself contains the family shuffled genetic material. Negative selection is again imposed and the process 
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of shuffling the improved strain is recursively repeated as desired. 

[0419] Although this process was described with reference to targeting a gene or genes of interest, it can be used 
"blindly," making no assumptions about which locus is to be targeted. This procedure is set forth in Fig. 23. For example, 
the whole genome of an organism of interest is cloned into manageable fragments (e.g., 10 kb for plasrnid-based 
5 methods). Homologous fragments are then isolated from related species by the method shown in Figure 23. Forced 
recombination with chromosomal homologs creates chimeras (Fig: 22). 

EE. METHODS FOR HIGH THROUGHPUT FAMILY SHUFFLING OF GENES 

10 [0420] For E. coil, cloning the genome in 10 kb fragments requires about 300 clones. The homologous fragments 
are isolated, e.g., from Salmonella. This gives roughly three hundred pairs of homologous fragments. Each pair is family 
shuffled and the shuffled fragments are cloned into an allele replacement vector. The inserts are integrated into the E. 
co// genome as described above. A global screen is made to identify variants with an improved phenotype. This serves 
as the basis collection of improvements that are to be shuffled to produce a desired strain. The shuffling of these 

15 independently identified variants into one super strain is done as described above. 

[0421] Family shuffling has been shown to be an efficient method for creating high quality libraries of genetic variants. 
Given a cloned gene from one species, it is of interest to quickly and rapidly Isolate homologs from other species, and 
this process can be rate limiting For example, if one wants to perform family shuffling on an entire genome, one may 
need to construct hundreds to thousands of individual family shuffled libraries. 

20 [0422] In this embodiment, a gene of interest is optionally cloned into a vector in which ssDNA can be made. An 
example of such a vector is a phagemid vector with an M1 3 origin of replication. Genomic DNA or cDNA from a species 
of interest is isolated, denatured, annealed to the phagemid, and then enzymatically manipulated to clone it. The cloned 
DNA is then used to family shuffle with the original gene of interest. PGR based formats are also available as outlined 
in Figure 24. These formats require no intermediate cloning steps, and are, therefore, of particular interest for high 

25 throughput applications. 

[0423] Alternatively, the gene of interest can be fished out using purified RecA protein. The gene of interest is PGR 
amplified using primers that are tagged with an affinity tag such as biotin, denatured, then coated with RecA protein (or 
an improved variant thereof). The coated ssDNA is then mixed with a gDNA plasmid library. Under the appropriate 
conditions, such as in the presence of non-hydrolyzab!e rATP analogs, RecA will catalyze the hybridization of the RecA 

30 coated gene (ssDNA) in the plasmid library. The heteroduplex is then affinity purified from the non-hybridizing plasmids 
of the gene library by adsorbtion of the labeled PGR products and its associated homologous DNA to an appropriate 
affinity matrix. The homologous DNA is used in a family shuffling reaction for improvement of the desired function. 
[0424] Shuffling the E. co//chaperonin gene Dria^with other homologs is described below as an example. The example 
can be generalized to any other gene, including eukaryotic genes such as plant or animal genes (including mammalian 

35 genes), by following the format described. Fig. 24 provides a schematic outline of the steps to high throughput family 
shuffling. 

[0425] Asa first step, the E. coli DnaJ gene is cloned into an M1 3 phagemid vector. ssDNA is then produced, preferably 
in a dut(-) ung(-) strain so that Kunkel site directed mutagenesis protocols can be applied. Genomic DNA is then isolated 
from a non- E. co/7 source, such as Salmonella and Yersinia Pestis. The bacterial genomic DNAs are denatured and 
^0 reannealed to the phagemid ssDNA (e.g., about 1 microgram of ssDNA). The reannealed product is treated with an 
enzyme such as Mung Bean nuclease thatdegrades ssDNA as an exonuclease but not as an endonuclease (the nuclease 
does not degrade mismatched DNA that is embedded in a larger annealed fragment). The standard Kunkel site directed 
mutagenesis protocol is used to extend the fragment and the target cells are transformed with the resulting mutagenized 
DNA. 

^5 [0426] In a first variation on the above, the procedure is adapted to the situation where the target gene or genes of 
interest are unknown. In this variation, the whole genome of the organism of interest is cloned in fragments (e.g., of 
about 10 kb each) into a phagemid. Single stranded phagemid DNA is then produced. Genomic DNA from the related 
species is denatured and annealed to the phagemids. Mung bean nuclease is used to trim away unhybridized DNA 
ends. Polymerase plus ligase is used to fill in the resulting gapped circles. These clones are transformed into a mismatch 

50 repair deficient strain. When the mismatched molecules are replicated in the bacteria, most colonies contain both the 
E. coli and the homologous fragment. The two homologous genes are then isolated from the colonies (e.g., either by 
standard plasmid purification or colony PGR) and shuffled. 

[0427] Another approach to generating chimeras that requires no in vitro shuffling is simply to clone the Salmonella 
genome into an allele replacement vector, transform E. coli, and select for chromosomal integrants. Homologous re- 
55 combination between Salmonella genes and E. coli homologs generate shuffled chimeras. A global screen is done to 
screen for improved phenotypes. Alternately, recursive transformation and recombination is performed to increase di- 
versity prior to screening. If colonies with improved phenotypes are obtained, it is verified that the improvement is due 
to allele replacement by P 1 transduction into a fresh strain and counterscreening for improved phenotype. A collection 
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of such improved alleles can then be combined into one strain using the methods for whole genome shuffling by blind 
family shuffling of parsed genomes as set forth herein. Additionally, once these loci are identified, it is likely that further 
rounds of shuffling and screening will yield further improvements. This could be done by cloning the chimeric gene and 
then using the methods described in this disclosure to breed the gene with homologs from many different strains of 
5 bacteria. 

[0428] In general, the Iransformants contain clones of the homologue of the target gene (e;g., E. coli DnaJ in the 
example above). Mismatch repair in vivo results in a decrease in diversity of the gene. There are at least two solutions 
to this. First, transduction can be performed into a mismatch repair deficient strain. Alternatively or in addition, the Ml 3 
template DNA can be selectively degraded, leaving the cloned homologue. This can be done using methods similar to 

10 the standard Eckstein site directed mutagenesis technique (General texts which describe general molecular biological 
techniques useful herein, including mutagenesis, include Sambrook et al., Molecular Cloning - A Laboratory Manual 
(2nd Ed.), Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 1989 ("Sambrook") and Current 
Protocols in Molecular Biology , F.M. Ausubel et al., eds., Current Protocols, a joint venture between Greene Publishing 
Associates, Inc. and John Wiley & Sons, Inc., (supplemented through 1998) ("Ausuber)). 

15 [0429] This method relies on incorporation of alpha thiol modified dNTP's during synthesis of the new strand followed 
by selective degradation of the template and resynthesis of the template strand. In one embodiment, the template strand 
is grown in a dut(-) ung(-) strain so that uracil is incorporated into the phagemid DNA. After extension as noted above 
(and before transformation) the DNA is treated with uracil glycosylate and an apurinicsite endonuclease such as Endo 
III or Endo IV. The treated DNA is then treated with a processive exonuclease that resects from the resulting gaps while 

20 leaving the other strand intact (as in Eckstein mutagenesis). The DNA is polymerized and ligated. Target cells are then 
transformed. This process enriches for clones encoding the homologue which is not derived from the target (i.e., in the 
example above, the non- E. coli. homologue). 

[0430] An analogous procedure is optionally performed in a PC R format. As applied to the DnaJ illustration above, 
DnaJ DNA is amplified by PCR with primers that buitd SO-mer priming sites on each end. The PGR is denatured and 
25 annealed with an excess of Sa/mone//a genomic DNA. The Salmonella DnaJ gene hybribidizes with the E co//homologue. 
After treatment with Mung Bean nuclease, the resulting mismatched hybrid is PCR amplified with the flanking 30-mer 
primers. This PCR product can be used directly for family shuffling. See, e.g., Fig. 24. 

[0431] As genomics provides an increasing amount of sequence information, it is increasingly possible to directly PCR 
amplify homologs with designed primers. For example, given the sequence of the E co//genome and of a related genome 
30 (i.e. Salmonella), each genome can be PCR amplified with designed primers in, e.g., 5 kb fragments. The homologous 
fragments can be put together in a pairwise fashion for shuffling. For genome shuffling, the shuffled products are cloned 
into the allele replacement vector and bred into the genome as described supra. 

FF. HYPER-RECOMBINQGENIC RECA CLONES 

[0432] The invention further provides hyper-recombinogenic RecA proteins (see. the examples below). Examples of 
such proteins are from clones 2, 4, 5, 6 and 13 shown in Fig. 13. It is fully expected that one of skill can make a variety 
of related recombinogenic proteins given the disclosed sequences. 

[0433] Clones comprising the sequences in Figs. 12 and 13 are optionally used as the starting point for any of the 
shuffling methods herein, providing a starting point for mutation and recombination to improve the clones which are shown. 
[0434] Standard molecular biological techniques can be used to make nucleic acids which comprise the given nucleic 
acids, e.g., by cloning the nucleic acids into any known vector. Examples of appropriate cloning and sequencing tech- 
niques, and instructions sufficient to direct persons of skill through many cloning exercises are found in Berger and 
Kimmel, Guide to Molecular Cloning Techniques, Methods in E^izymo/ogy volume' 152 Academic Press, Inc., San Diego, 
CA (Berger); Sambrook et al. (1989) Molecular Cloning - A Laboratory Manual (2r\6 ed.) Vol. 1-3,' Cold Spring Harbor 
Laboratory, Cold Spring Harbor Press, NY. (Sambrook); and Current Protocols in Molecular Biology, F.M. Ausubel 
a/., eds.. Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., 
(1994 Supplement) (Ausubel). Product information from manufacturers of biological reagents and experimental equip- 
ment also provide information useful in known biological methods. Such manufacturers include the SIGMA chemical 
company (Saint Louis, MO), R&D systems (Minneapolis, MN). Pharmacia LKB Biotechnology (Piscataway, NJ), CLON- 
TECH Laboratories, Inc. (Palo Alto, CA), Chem Genes Corp., Aldrich Chemical Company (Milwaukee, Wl), Glen Re- 
search, Inc., GIBCO BRL Life Technologies, Inc. (Gaithersbcrg, MD), Fluka Chemica-Biochemika Analytika (Fluka 
Chemie AG, Buchs, Swrtzeriand), I nvitrogen, San Diego, CA, and Applied Biosystems (Foster City, CA), as well as many 
other commercial sources known to one of skill. 

[0435] It will be appreciated that conservative substitutions of: the given sequences can be used to produce nucleic 
acids which encode hyperrecombinogenic clones. "Conservatively modified variations" of a particular nucleic acid se- 
quence refers to those nucleic acids which encode identical or essentially identical amino acid sequences, or where the 
nucleic acid does not encode an amino acid sequence, to esscniinlly identical sequences. Because of the degeneracy 
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of the genetic code, a large number of functionally identical nucleic acids encode any given polypeptide. For instance, 
the codons CGU. CGC, CGA, COG, AGA, and AGG all encode the amino acid arginine. Thus, at every position where 
an arginine is specified by a codon, the codon can be altered lo any of the corresponding codons described without 
altering the encoded polypeptide. Such nucleic acid variations are "silent variations," which are one species of "conserv- 

5 atively modified variations." Every nucleic acid sequence herein which encodes a polypeptide also describes every 
possible silent variation. One of skill will recognize that each codon in a nucleic acid {except AUG, which is ordinarily 
the only codon for methionine) can be modified to yield a functionally identical molecule by standard techniques. Ac- 
cordingly, each "silent variation" of a nucleic acid which encodes a polypeptide is implicit in any described sequence. 
Furthermore, one of skill will recognize that individual substitutions, deletions or additions which alter, add or delete a 

10 single amino acid or a small percentage of amino acids (typically less than 5%, more typically less than 1 %) in an encoded 
sequence are "conservatively modified variations" where the alterations result in the substitution of an amino acid with 
a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well 
known in the art. The following six groups each contain amino acids that are conservative substitutions for one another 
1) Alanine (A), Serine (S), Threonine (T); 2) Aspartic acid (D). Glutamic acid (E); 3) Asparagine (N), Glutamine (Q); 4) 

15 Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M). Valine (V); and 6) Phenylalanine (F), Tyrosine 
(Y), Tryptophan (W). See also. Creightoh (1 984) Proteins^N.H. FrGcman and Company. Finally, the addition of sequences 
which do not alter the encoded activity of a nucleic acid molecule, such as a non-functional sequence is a conservative 
modification of the basic nucleic acid. 

[0436] One of skill will appreciate that many conservative variations of the nucleic acid constructs disclosed yield a 
20 functionally identical construct. For example, due to the degeneracy of the genetic code, "silent substitutions" ij.e., 
substitutions of a nucleic acid sequence which do not result in an alteration in an encoded polypeptide) are an implied 
feature of eve/y nucleic acid sequence which encodes an amino acid. Similarly, "conservative amino acid substitutions," 
in one or a few amino acids in an amino acid sequence of a packacjing or packageable construct are substituted with 
different amino acids with highly similar properties, are also readily identified as being highly similar to a disclosed 
25 construct. Such conservatively substituted variations of each explicitly disclosed sequence are a feature of the present 
invention. 

[0437] Nucleic acids which hybridize under stringent conditions to the nucleic acids in the figures are a feature of the 
invention. "Stringent hybridization wash conditions" in the contoxt of nucleic acid hybridization experiments such as 
Southern and northern hybridizations -are sequence dependent, and are different under different environmental param- 

30 eters. An extensive guide to the hybridization of nucleic acids is iound in Tijssen (1993) Laboratory Techniques in 
Biochemistry and Molecular Biology — Hybridization with Nucleic Acid Probes part I chapter 2 "overview of principles of 
hybridization and the strategy of nucleic acid probe assays". Elsevier, New York. Generally, highly stringent hybridization 
and wash conditions are selected to be about 5" C lower than the thermal melting point (T„,) for the specific sequence 
at a defined ionic strength and ph. The T^^ is the temperature (under defined ionic strength and pH) at which 50% of the 

35 target sequence hybridizes to a perfectly matched probe. Very stringent conditions are selected to be equal to the T^ 
for a particular probe. In general, a signal lo noise ratio of 2x (or higher) than that observed for an unrelated probe in 
the particular hybridization assay indicates detection of a specific hybridization. 

[0438] Nucleic acids which do not hybridize to each other under stringent conditions are still substantially identical if 
the polypeptides which they encode are substantially identical. Tliis occurs, e.g., when a copy of a nucleic acid is created 
40 using the maximum codon degeneracy permitted by the genetic code. 

[0439] Finally, preferred nucleic acids encode hyper-recomblnogcnic RecA proteins which are at least one order of 
magnitude (10 times) as active as a wild-type RecA protein in a staiuiard assay for Rec A activity. 

GG. recE / rect MEDIATED SHUFFLING IN VIVO 

45 

[0440] Like recA, recE and recT (or their homologues, for example the lambda recombination proteins reda and redP) 
can stimulate homologous recombination in vivo: See. Muyrers et al. (1999) Nucleic Acids Res 27(6): 1555-7 and Zhang 
etal. (1998) Nat GeDe/20(2):123-8 

[0441] Hyper-recombinogenic recE and recT are evolved by the same method as described for recA. Alternatively, 
50 variants with increased recombinogenicity are selected by their ability to cause recombination between a suicide vector 
(lacking an origin of replication) carrying a selectable marker, and a homologous region in either the chromosome or a 
stably-maintained episome. 

[0442] A plasmid containing recA and recE genes is shuffled (either using these genes as single starting points, or by 
family shuffling (with for example reda and red|3, or other homoingous genes identified from available sequence data- 
55 bases). This shuffled library is then cloned into a vector with a solectatalc marker and transformed into an appropriate 
recombination-deficient strain. The library of cells would then bo transformed with a second selectable marker, either 
borne on a suicide vector or as a linear DNA fragment with regions at its ends that are homologous to a target sequence 
(either in the plasmid or in the host chromosome). Integration of this marker by homologous recombination is a selectable 
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event, dependent on the activity of the recE and recT gene products. The recE / recT genes are isolated from cells in 
which homologous recombination has occurred. The process is repeated several times to enrich for the most efficient 
variants before the next round of shuffling is performed. In addition, cycles of recombination without selection can be 
performed to increase the diversity of a cell population prior to selection. 

5 [0443] Once hyper-recombinogenic recE / recT genes arc isolated they are used as described for hyper-recombino- 
genic recA. For example they are expressed (constitutively or conditionally) in a host cell to facilitate homologous 
recombination between variant gene fragments and homologues within the host cell. They are alternatively introduced 
by microinjection, biolistics, lipofection or other means into a host ceil at the same time as the variant genes. 
[0444] Hyper-recombinogenic recEJ recT (either of bacterial / phage origin, or from plant homologues) are useful for 

10 facilitating homologous recombination in plants. They are, for example, cloned into the Agrobacterium cloning vector, 
where they are expressed upon entry into the plant, thereby stimulating homologous recombination in the recipient cell. 
[0445] In a preferred embodirinent. recE/ recT are used and or generated in mutS strains. 

HH. MULTI-CYCLIC RECOMBINATION 

15 

[0446] As noted, protoplast fusion i,s an efficient means of rccombining two nriicrobiat genomes. The process repro- 
ducibly results in about 10% of a non-selected population being rocombinant chimeric organisms. . . 
[0447] Protoplasts are cells that have been stripped of their cell waits by treatment in hypotonic medium with cell wall 
degrading enzymes. Protoplast fusion is the induced fusion of the membranes of two or more of these protoplasts by 
20 fusogenic agents such as polyethylene glycol. Fusion results in cytoplasrriic mixing and places the genomes of the fused 
cells within the same membrane. Under these conditions recombination between the genomes is frequent. 
[0448] The fused protoplasts are regenerated, and, during cell division, single genomes segregate into each daughter 
cell. Typically, 10% of these daughter ceils have genomes that originate partially from more than one of the original 
parental protoplast genomes. 

25 [0449] This result is similar to that of the crossing over of si^-lor chromatids in eukaryotic celts during prophase of 
meiosis II. The percentage of daughter cells that are recombinant is just lower after protoplast fusion. While protoplast 
fusion does result in efficient recombination, the recombination predominantly occurs between two cells as in sexual 
recombination. 

[0450] In order to efficiently generate libraries of whole genome shuffled libraries, daughter cells having genetic infor- 

30 mation originating from multiple parents are made. 

[0451] In vitro DNA shuffling results in the efficient poolwise recombination of multiple homologous DNA sequences. 
The reassembly of full length genes from a mixed poo! of small gcno fragments requires multiple annealing and elongation 
cycles, the thermal cycles of the primerless PCR reaction. During each thermal cycle, many pairs of fragments anneal 
and are extended to form a combinatorial population of larger chimeric J A fragments. After the first cycle of reassembly, 

35 chimeric fragments contain sequences originating from two different parr^nt genes. This is simitar to the result of a single 
sexual cycle within a population, pairwise cross, or protoplast fusion. During the second cycle, these chimeric fragments 
can anneal with each other, or with other small fragments, resulting i^ chimeras originating from up to four different 
parental sequences. 

[0452] This second cycle is analogous to the entire progeny from a single sexual cross inbreeding with itself. Further 
"fo cycles will result in chimeras originating from 8. 16. 32, etc pare n-ai soDuonces and are analogous to further inbreedings 
of the progeny population. The power of in vitro DNA shuffling is that a large combinatorial library can be generated from 
a single pool of DNA fragments reasserribled by these recursive pairwise "matings." As described above, In v/Vo shuffling 
strategies, such as protoplast fusion, result in a single pairwise mating reaction. Thus, to generate the level of diversity 
obtained by in vitro methods, in vivo methods are carried out recursively. That is, a pool of organisms is recombined 
^5 and the progeny pooled, without selection, and then recombined again. 1 his process is repeated for sufficient cycles to 
result in progeny having multiple parental sequences. 

[0453] Described below is a method used to shuffle four strains of Straptomyces coelicolor. From the initial four strains 
each containing a unique nutritional marker, three to four rounds of recursive pooled protoplast fusion was sufficient to 
generate a population of shuffled organisms containing all 16 possible coa^binations of the four markers. This represents 
50 a 10^ fold improvement in the generation of four parent progeny as com; r rod to a single pooled fusion of the four strains. 
[0454] As set forth in Figure 31, protoplasts were generated (rom several strains of S. coelicolor, pooled and fused. 
Mycelia were regenerated and allowed to sporulate. The spores were collected, allowed to grow into Mycelia, formed 
into protoplasts, pooled and fused and the process repeated for three to four rounds, the resulting spores were then 
subject to screening. 

55 [0455] The basic protocol for generating a whole genome sh. riled i!h^"!^y from four S. coe//co/or strains, each having 
one of four distinct markers, was as follows. Four mycelial cultures, each of a strain having one of four different markers, 
were grown to eariy stationary phase. The mycelia from each wore harvested by centrifugation and washed. Protoplasts 
from each culture were prepared as follows. 
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[0456] Approximately 10^ S. coelicolor spores were inoculntod into 50ml YEME with .0.5% Glycine in a 250ml baffled 
flask. The spores were incubated at 30"C for 36-40 hours in an orbital shrH or. Mycelium were verified using a microscope. 
Some strains needed an additional day of growth. The culture was transi erred into a 50ml tube and centrifuged at 4,000 
rpm for 10 min. The mycelium were twice washed with 10.3% sucrose and centrifuged at 4,000 rpm for 10 min. (mycelium 

5 can be stored at -SOX after wash). 5ml of lysozymewas added to the -O.Sg of mycelium pellet. The pellet was suspended 
and incubated at SOX for 20-60 min., with gentle shaking every 10 min. The microscope was checked for protoplasting 
every 20 min. Once the majority were protoplasts, protoplasting was stooped by adding 10ml ofP buffer. The protoplasts 
were filtered through cotton and the protoplast spun down at 3,000rpm for 7 min at room temperature. The supernatant 
was discarded and the protoplast gently resuspended, adding ri suitable amount of P buffer according to the pellet size 

10 (usually about 500|xl). Ten-fold serial dilutions were made in P buffer, and the protoplasts counted at a 10-2 dilution. 
Protoplasts were adjusted to ^0^^ protoplasts per ml. 

[0457] The protoplasts from each culture were quanlitated by microscopy. 1 0^ protoplast from each culture were mixed 
in the same tube, washed, and then fused by the addition of 5^% PEG. The fused protoplasts were diluted and plated 
regeneration medium and incubated until the colonies were sporulating (four days). Spores were harvested and washed. 

fS These spores represent a pool of all the recombinants and pnrcinis form !he fusion. A sample of the pooled spores was 
then used to inoculate a single liquid culture. The culture was ;;rown to/^nrly stationary phase, the myclelia harvested, 
and protoplasts prepared. 10** protoplasts from this "mycelial I'brnry" wcro then fused with themselves by the addition 
of 50%PEG. The protoplast fusion/regeneration/harvesting/pro.toplast prr^oaration steps were repeated two times. The 
spores resulting from the fourth round of fusion were considcrod the "\vhole genome shuffled library" and they were 

20 screened for the frequency of the 16 possible combinations ol the four markers. The results from each round effusion 
are shown figure 33 and in the following table. 

[0458] The results of the shuffling procedure are set forth in = nure 33. In particular, adding rounds of recombination 
prior to selection produced significant increases in the numbr^r of clones which incorporated alt four of the relevant 
selectable markers, indicating that the population became increasingty diverse be recursive pooling and sporulation. 
25 Additional results are set forth in the following table. 
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55 [0459] The four strains of the four parent shuffling were each ?iuxo!rorihic for three and prototrophic for one of four 
possible nutritional markers: argintne (A), cystine (C). proling (P). -nd/or uracil (U). Spores from each fusion were plated 
in each of the 16 possible combinations of these four nutrients, anr! the ror-ent of the population growing on a particulate 
medium was calculated as the ration of those colonies form scScctivc :^l:-ie to those growing on a plate having all four 
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nutrients (all variants grow on the medium having all four nutrients, thus the colonies from this plate tus represent the 
total viable population). The corrected percentages for each of the no, one, two, and three marker phenotypes were 
determined by subtracting the percentage of cells having additional markers that might grow on the medium having 
"unnecessary" nutrients. For example, the number of colonics crowing on no additional nutrients (the prototroph) was 
5 subtracted from the number of colonies growing on any plate rcvjiring nutrients. 

II. WHOLE GENOME SHUFFLING THROUGH ORGANIZE D H lETERO OUPLEX SHUFFLING 

[0460] A new procedure to optimize phenotypes of interests 'ly hctoroduplex shuffling of cosmids libraries of the 

10 organism of choice, is provided. This procedure does not require protoplast fusion and is applicable to bacteria for which 
well-established genetic systems are available, including cosmid cloning, transformation, in vitro packaging/transfection 
and plasmid transfer/mobilization. Microorganism that can be improved by these methods include Escherichia coli, 
Pseudomonas aeruginosa, Pseudomonas putida, Pseudomonns r^pp,, Rhizobium spp., Xanthomonas spp„ and other 
gram-negative organisms. This method is also applicable to Grarm-positive microorganisms. 

15 [0461] A basic procedure for whole genome shuffling Ihrounh ornnni^r^^ heteroduplex shuffling is set forth in Figure 34. 
[0462] In step A. Chromosomal DNA of the organism to be imp: -^ved is digested with suitable restriction enzymes and 
ligated into a cosmid. The cosmid used for cosmid-based ho!'- 'idupinx guided WGS has at least two rare restriction 
enzyme recognition sites (e.g. Sfr and Notl) to be used for tiponrizfition in subsequent steps. Sufficient cosmids to 
represent the complete chromosome are purified and stored in n--woll microtiter dishes. In step B, small samples of the 

20 library are mutagenized in vitro using hydroxylamine or other mutagenic chemicals. In step C, a sample from each well 
of the mutagenized collection is used to transfecl the target coi'r,. In s'nn D, the transfectants are assayed (as a pool 
from each mutagenized sample-well) for phenotypic improvcmonts. l^ositives from this assay indicate that a cosmid 
from a particular well can confer phenotypic improvements anri v-yjs con! n in large genomic fragments that are suitable 
targets for heteroduplex mediated shuffling. In step E» the tran::;^ ctcd r-ils harboring a mutant library of the identified 

25 cosmid(s) are separated by plating on solid media and scroc^nod for independent mutants conferring an improved 
phenotype. In step F, DNA from positive cells is isolated and pooled hy ongin. In step G, the selected cosmid pools are 
divided so that one sample can be digested with Sfr and tho oihc wi:'^ Notl. These samples are pooled, denatured, 
reannealed, and religated. 

[0463] Instep H, target cells are transfected with the resultinn hoiorod' ninxes and propagated to allow "recombination" 
30 to occur between the strands of the heteroduplexes in vivo, r- o trnr^Moctants can be screened (the population will 

represent the pairwise recombinants) or, commonly, as reprrr.'^ntoci '^y step h the recombined cosmids are further 

shuffled by recursive in vitro heteroduplex formation and in vt> = rnrxinThinaiion (to generate a complete combinatorial 

library of the possible mutations) prior to screening. An adriinof^ n^nlngr^nesis step could also be added for increased 

diversity during the shuffling process. 
35 [0464] In step J, once several cosmids harboring different r'i 'ributo^ inci have been improved, they are combined 

into the same host by chromosome integration. This organisp' •:an b" s sed directly or subjected to a new round of 

heteroduplex guided whole genome shuffling. 

EXAMPLES 

40 

[0465] The following examples are offered to illustrate, but not to the present invention. Essentially equivalent 
variations upon the exact procedures set forth will be apparcpi to one t i :M\ upon review of the present disclosure. 

A. EXAMPLE 1: EVOLVING HYPER-RECOMBINOGENIC H\:.C\- 

45 

[0466] RecA protein is implicated in most E. co// homologous ;r-ooTh;:vilion pathways. Most mutations in recA inhibit 
recombination, but some have been reported to increase roc-> ;i'V;!i;on fKowalczykowski et a\., Microbiol, Rev., 58, 
401-465 (1994)). The following example describes evolution ol r<';cA to acquire hyper-recombinogenic activity useful in 
in vivo shuffling formats. 

50 [0467] Hyperrecombinogenic RecA was selected using a mo' : 'inatior of a system developed by Shen et aL, Genetics 
112, 441-457 (1986); Shen et al., /Wo/. Cen. Genet. 218, 358-31 - (1980)) to measure the effect of substrate length and 
homology on recombination frequenicy. Shen & Huang's system used f! nsmids and bacteriophages with small (31-430 
bp) regions of homology at which the two could recombine. In n rostrir : ve host, only phage that had incorporated the 
plasmid sequence were able to form plaques, 

55 [0468] For shuffling of recA, endogenous recA and mutS v'^!e cfoSnt-^d from host strain MC1061. In this strain, no 
recombination was seen between plasmid and phage. E. coli(r r. '\ w -s I 'icn cloned into two of the recombination vectors 
(Bp221 and 7iMT631c18). Plasmids containing cloned RecA wore aiolo lo recombine with homologous phage:XV3 (430 
bp identity with Bp221).XV13 (430 bp stretch of 89% idcntiiy Bp?21 ) and Xlink H (31bp identity with nMt631c18, 
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except for 1 mismatch at position 18). 

[0469] The cloned RecA was then shuffled in vitro using tho stanc'nrd DNase-treatment followed by PCR-based 
reassembly. Shuffled plasmids were transformed into the non-rccom';:;ining host strain. These cells were grown up 
overnight, infected with phage XVc, >.V13 or XWnk H, and plated r.-Mo N/.CYM plates in the presence of a 10-fold excess 
5 of MC1061 lacking plasnhid. The more efficiently a recA allele pr{-rr,otes r ecombination between plasmid and phage, the 
more highly the allele Is represented in the bacteriophage DNA. Consequently, harvesting all the phage from the plates 
and recovering the recA genes selects for the most recombinoocnic rocf\ alleles. 

[0470] Recombination frequencies for wild type and a pool of hypnr-recombinogenic RecA after 3 rounds of shuffling 
were as follows: 

10 

Cross Wild Tyrie Hyper Recom 

.BP221XV3 6.5x10-'' 3.3x10-2 

BP221 X V13 2.2 x liV^ 1.0 x lO'^ 

^5 "MT631c18xlinkH 8.7x10-6 47x10-5 



These results indicate a 50-fold increase in recombination for W.n ^^30 ho substVate, and a 5-fold increase for the 31 bp 
substrate. 

[0471] The recombination frequency between BP221 and vr' 'or five individual clonal isolates are shown below, and 
the DNA and protein sequences and alignments thereof are inciudcri in '"^gs. 12 and 13. 
Wildtype: 1.6 x 10-4 
Clone 2: 9.8 x 10-^ (61 x increase) 
Clone 4: 9.9 x 10-^ (62 x increase) 
Clone 5: 6.2 x 10-^ (39 x increase) 
^5 Clone 6: 8.5 x 10-3 (53 x increase) 
Clone 13: 0.019 (116 x increase) 

Clones 2, 4, 5, 6 and 13 can be used as the substrates in suhs-^'^'irmi rourids of shuffling, if further improvement in recA 
is desired. Not all of the variations from the wildtype recA soquon-o necessarily contribute to the hyperrecombinogenic 
phenotype. Silent variations can be eliminated by backcrossing. Alternatively, variants of recA incorporating individual 
points of variation from wildtype at codons 5, 18, 1 56, 190, 236. 2f^.8, 271 . 283, 304, 312, 317, 345 and 353 can be tested 
for activity. 

B. EXAMPLE 2: WHOLE ORGANISM EVOL l jTION FOR HY? ' r ; R , f P r- o mbinATION 

[0472] The possibility of selection for an E. co// strain with p.n increa- :d level of recombination was indicated from 
phenotypes of wild-type, /SrecA, mutS and Arec/A mafS strains Joilowing ^ -'posure to mitomycin C, an inter-strand cross- 
linking agent of DNA. 

[0473] Exposure of E. co//to mitomycin C causes inter-strand cross-linking of DNA thereby blocking DNA replication. 
Repair of the inter-strand DNA cross links in E. coU occurs vin a RccA-dependent recombinational repair pathway 
(Friedberg et al., in DNA Repair and Mutagenesis (1995) pp. 101-232). l-rocessing of cross-links during repair results 
in occasional double-strand DNA breaks, which too are ropni-'p': by s f '^v-A^jependent recombinational route. Accord- 
ingly, recA" strains are significantly more sensitive than witcilv;! rains to mitomycin C exposure. In fact, mitomycin C 
is used in simple disk-sensitivity assays to differentiate belvycr;n -'ocA* and RecA' strains. 

[0474] In addition to its recombinogenic propertieSj mitomyr'n C is r. p utagen. Exposure to DNA damaging agents, 
^5 such as mitomycin C, typically results in the induction of the /r. coll SOS regulon which includes products involved in 
error-prone repair of DNA damage (Friedberg et al., 1995, suprn, at pp. ^'65-522). 

[0475] Following phage Pi-mediated generalized transductir n of tho ■ r-ec>A-sr/)::Tn10 allele (a nonfunctional allele) 
into wild-type and muiS E. coli, tetracycline-resistant transdn-tan'.s v:nrQ screened for a recAphenotype using the 
mitomycin C-sensitivity assay. It was observed in LB overlays w'lh a 1M i- h filter disk saturated with 10 p.g of mitomycin 
C following 48 hours at 37''C, growth of the wild-type and mirS st^Mn- '--as inhibited within a region with a radius of 
about 10 mm from the center of the disk. DNA cross-linking nt high icvi.is of mitomycin C saturates recombinational 
repair resulting in lethal blockage of DNA replication. Both st'-^^ins gave rise to occasional colony forming units within 
the zone of inhibition, although, the frequency of colonies was - 1 0-20-fnid higher in the mutS strain. This is presumably 
due to the increased rate of spontaneous mutation of mutS b?icA -ouncfs. A side-by-side comparison demonstrated that 
the ArecA and ArecA niutS strains were significantly more so'^ ifvo t" '"itomycin C with growth inhibited in a region 
extending about 15 mm from the center of the disk. Howevry. ■ .nnirr-: • lo the recA* strains, no Mif^ individuals were 
seen within the region of growth inhibilton-not even in the miitS ^. r'<rircAi^--i. The appearance of Mif^ individuals in recA* 
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backgrounds, but not in Is^recA backgrounds indicates the MiU is lepcnc'nsi upon a functional RecA protein and suggests 
that Mif^ may result from an increased capacity for recombinai: nai rop.^ir of mitomycin C-induced damage. 
[0476] Mutations which lead to increased capacity for RgcA- Tedintcr' rccombinational repair may be diverse, unex- 
pected, unlinked, and potentially synergistic. A recursive protocc! altcrnaiir g selection for Mit^ and chromosomal shuffling 

5 evolves individual cells with a dramatically increased capacity 'or recombination. 

[0477] The recursive protocoi is as follows. Following exposure ol a nvns strain to mitomycin C, Mit^ individuals are 
pooled and cross-bread [e.g., via Hfr-mediated chromosomal shuf "ing or split-pool generalized transduction, or protoplast 
fusion). Alleles which result in Mil*^ and presumably result in an in-,' oaso^i capacity for rccombinational repair are shuffled 
among the population in the absence of mismatch repair. In .^r!d - prone repair following exposure to mitomycin 

to C can introduce new mutations for the next round of shuffling. 1 process is repeated using increasingly more stringent 
exposures to mitomycin C. A number of parallel selections in tho fi^ot round as a means of generating a variety of alleles. 
Optionally, recombinogencity of isolates can be monitored for hyi^r-rcc ombination using a plasmid x plasmid assay or 
a chromosome x chromosome assay (e.g., that of Konrad. J. f" ^ \' ^ -/n/. 130, 167-172 (1977)). 

15 C. EXAMPLE 3: WHOLE GENOME SHUFF I JNG OF STRE PrnMYCEr CO ELlCOLORrO IMPROVE THE PRODUC- 
TION OF 7-ACTINORHODlN. 

[0478] To improve the production of the secondary metabolic * y-p- tin- ; 'odin from S. coelicolor, the entire genome of 
this organism is shuffled either alone or with its close relative S ' V/ ^-^or = ^ the first procedure described below, genetic 

20 diversity arises from random mutations generated by chcmic:^' rv [ hysi'^nl means. In the second procedure, genetic 
diversity arises from the natural diversity existing between tho ^ r '^.omos cf S. coe//co/or and S. fividans. 
[0479] Spore suspensions of S. coelicolor are resusponriod i- sin iir; \^';Uer and subjected to UV mutagenesis such 
that 1% of the spores survive (-600 "energy" units using a Sl^riir.'-f'^r, S!^'l'agene), and the resulting mutants are "grown 
out" on sporulation agar. Individual spores represent uninuciOrV ^oiis h i^horing different mutations within their genome. 

25 Spores are collected, washed, and plated on solid medium, prri orahly : oy agar, R5, or other rich medium that results 
in sporulating colonies. Colonies are then imaged and picknd rn;' IcTiiy using an automated colony picker, for example 
the Q-bot (Genetix). Alternatively colonies producing larger or - 'ir^er ^--ilos of blue pigment are picked in addition or 
preferentially. 

[0480] The colonies are inoculated into 96 wcW micrntitrc oI mos r.or i-iining x YEME medium (M0^\ /well). Two 
30 sterile 3mm glass beads are added to each well, and the plnvs are s'v-.kcn at 150-250 rpm at 30 'C in a humidified 
incubator. The plates are incubated up to 7 days and the cell s ipernalc^.'s are assayed for y-actinorhodin produclion. 
[0481] To assay, SOjiL of supernatant is added to 100nL of - istitio i v ' -r in a 96 well polypropylene microtitre plate, 
and the plate is centrifuged at 4000 rpm to pellet the myccllH. ' 0 u 1. c' cleared supernatant is then removed and 
added to a flat bottom polystyrene 96 well microtitre plate con" i -inr: jlL 1M KOH in each well. The resulting plates 
35 are then read in a microtitre plate reader measuring tho absori --np nt f^fVi nm of the individual samples as a measure 
of the content 7-actinorhodin. 

[0482] Mycelia from cultures producing y-actinorhodin at inv - '^i;:n:-'i-".n||y higher than that of wildtype S. coelicolor 
are then isolated. These are propagated on solid sporu!atio!i u:r., • spore preparations of each improved mutant 
are made. From these preparations protoplasts of each of tlv: i' iprove ' mutants are generated, pooled together, and 
40 fused (as described in Genetic Mani pulation oL§lr£P'9[T)y'^'^.^_'^^ laho^ nor y Manual. Hopwood, D.A., et al.). The fused 
protoplasts are regenerated and allowed to sporulato. Spores a-^ coi'r^^ and either plated on solid medium for further 
picking and screening, or, to increase the representation ol miiu pafcn: progeny, are used to generate protoplasts and 
fused again (or several times as described previously for motho^' ■ to e^fe^! noolwise recombination) before further picking 
and screening. 

45 [0483] Further improved mutants result from the combination nf Uvo or^ ro mutations that have additive or synergistic 
effects on g-actinorhodin production. Further improved mulaiv - c'v-- i:)^ rj-ain mated by protoplast pootwise fusion, or 
they can be exposed to random mutagenesis to create a now r-iot i-ii'^ . cells to be screened and mated for further 
Improvements. 

[0484] As an alternative to random mutagenesis a source o'' ntic f ' /orsity, natural diversity can be employed. In 
50 this case, protoplasts generated from wildtype S. coelicolor cinfi S. ihmJnns are fused together. Spores from the regen- 
erated progeny of this mating are then either repetitively fiisn:' ro orated to create additional diversity, or they 
are separated on solid medium, picked, and screened forennno; ■ ^^rod - innof g-actinorhodin. As before, the improved 
subpopulation are mated together to identify lit rl her improver: • nily s'^ r'-cd organisms. 

55 D. EXAMPLE 4: A HIGH THROUGHPUT A C TINORH OO i N AS . • A>' 

[0485] Additional Details on a high-throughput shuffling act-'.orhndi'" ;issay used to select mycelia are set forth in 
Figure 32. In brief, shufflants were picked by standard automair;;! 'rocod ■ ;s using a Q-bot robotic system and transferred 
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to standard 96 well plates. After incubation at aCC for 7 days, :hc rcsuiiinQ mycella were centrifuged, and a sannple of 
cell supernatant was removed and mixed with 0.1 M KOf-l in p. OG vvoM r^i iie and the absorbance read at 654nm. The 
best positive clones were selected and grown in shake flasks. 

[0486] Approximately 10^ protoplasts wore centrifuged at n/V'"" ■ ;vn t-r 7 min. When more than one strain was used, 
5 equal number of protoplasts were obtained from each strain. Mr; i of buffer was removed and the pellet suspended 
in the remaining buffer (~25|j.l total volume) by gentle flicking. O.rvnl of 50% PEG1000 was added and mixed with the 
protoplasts by gently pipetting in and out 2 times. The mixiuro v. . s then incubated for 2 minutes. 0.5ml ofP buffer was 
added and gently mixed. (This is the fusion at a dilution of 10 *). A tor-f ild serial dilution was performed in P buffer. 
After 2 minutes, dilutions were plated, at .10-'. 10-^ and 10"^ oriio R5 v':-U:s with SOjil of. each, 2"^ plates each dilution. 
10 (for plating, -20 of 3mm glass beads were used, gentle shaking). As a first control, for regeneration of protoplasts, the 
same number of protoplasts were used, as above, adding P biiffcr to a total of 1ml (this Is the regeneration at dilution 
10-''). The mixture was further diluted (10X) in P buffer. The dilutions wore plated at 10-3, 10"^ and 10"^ onto R5 plates 
with 50jx!of each. As a second control, (as a non-protoplastinrt p!in bar:'..;;] round check) the same number of protoplasts 
as above were used adding 0.1% SDS to a total of .l ml (this is t! o background at dilution 10'''). After further 10X dilution 
15 in 0.1% SDS, the dilution was plated at 10-1, 10-2 and 10-^ ont'^ l^f: rin* -^. with 50ftl of each. The plates were air dried 
and Incubated at SOX for 3 days. 

[0487] The number of colonies was counted from each platn • ihoso tl i;r. were countable), using the number of regen- 
erated protoplast as 100% and calculating the perccnlago o' (usually less than one) and fusion survival 
(usually greater than 10). The fusion plates were incubated al --^rr' : e days until all colonies were well sporulated. 
20 Spores were harvested from those plates having less than r coSc.nios. Spores were filtered through cotton and 
washed once with water, suspended in 20% Glycerol and cm ■ !. T' o^ o spores are used for further study, culture 
inoculation or simply stored at - 20''C. 

E. EXAMPLE 4: WHOLE GENOME SHUF FLING OF RHODOrOCCi FOR TWO-PHASE REACTION CATALYSIS 

25 - . 

. [0488] This example provides an example of how' to apply thr 'achnfr^iios described herein to technologies that allow 
the generic improvement of biotransformations cataly7.od by w - lo co!l^. ^^hodococcus \Nas selected as an initial target 
because it is both representative of systerns in which molcf i: ! ioin ' is rudimentary (as is common in whole cell 
catalysts which are generally selected by screening environ.-i--!.-! '>s^r^' '-'ns), and because it is an organism that can 
30 catalyze two-phase reactions. 

[0489] The goal ofwhole genome shuffling of Rhodococcit<\r • , increase in flux through any chosen pathway. 

The substrate specificity of the pathway can be altered to acc^ ^ ' .. ^..p,,,,!^,, which are not currently substrates. Each of 
these features can be selected for during whole gonofTie shir ' : 

[0490] During whole genome shuffling, libraries of shuffled ■ ■ " '-v-^^- ;vid pathways are made and transformed into 
35 Rhodococcus and screened, preferably by liigh-throughput ar . ■ fo; - r^^rovements in the target phenotype, e.g., by 
mass spectroscopy for measuring the product. 

[0491] As noted above, the chromosomal context of genes -.-n hrwo dramatic effects on their activities. Cloning of 
the target genes onto a small plasm id in Rhndococcus can dfrv ;itic;V'" reduce the overall pathway activity (by a factor 
of 5- to 10-fold or more). Thus, the starting point for DNA shufflinn of a p'Mhway (on a plasmid)can be 10-fold lower than 

"10 the activity of wild-type strain. By contrast, integration of tho gc'"^s into .-random sites in the Rhodococcus chromosome 
can result in a significant (5- to 10-fold) incronse in aciivity. A i ^ 'Hnr r^- nmenon was observed in the recent directed 
evolution in E cofioi an arsenate resistance oporon {original iy . : ^ococcus aureus) by DNA shuffling. Shuffling 

of this plasmid produced sequence changes that led to efficirv ' i- .rnr^i n of the operon into the E co// chromosome. 
Of the lota! 50-fold increase in arsenate resistance obtained by diror.tcd - volution of the three gene pathway, approxi- 

45 mately 10-fold resulted from this integration into the chromoso v. ;:,jiion within the chromosome is also likely to 
be important: for example sequences close lb the replication (r= ^nve n • effectively highergene dosage and therefore 
greater expression level 

[0492] In order to fully exploit unpredictable chromosomal r ition cff^^cts, and to incorporate them into a directed 
evolution strategy which utilizes multiple cycles of mutation, rorr hini^ii -^nd selection, genes are manipulated in vitro 

50 and then transferred to an optimal chromosomal position. r^cco;ri;:in;ition between plasmid and chromosome occurs in 
two different ways. Integration takes place at a position whore 1=^ - e is r'^- iftcant sequence homology between plasmid 
and chromosome, i.e., by homologous recombination. Integrnii'^ a!sn ' s place where there is no apparent sequence 
identity, i.e., by non-homologous recombination. Those two rccrMTbin,::: • mechanisms are effected by different cellular 
machineries and have different potential a[)p*!cations in direclr -' ovo!!.''' •\ 

55 [0493] To combine the increase in activity that resumed fm- . icno tication and chromosomal integration of the 
target pathway with the powerful technique o; DNA shufflinc] , -oi r f • tifled genes are made in vitro, and integrated 
into the chromosome in place of the wild-typo nenes by homoio' - ror- .^ -[nation. Recombinants are then be screened 
for increased activity. This process is option nlly made ^ecu! ^: s f ls rod herein. The best Rhodococcus variants 
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are pooled, and the poo! divided in two. Genes are cloned oiii f : :ho po- ' by PGR, shuffled together and re-integrated 
into the chromosomes of the other half of the pool by homoir r u ^ recc ibination. Recombinants are once again be 
screened, the best taken and pooled and the process optional'' - 'opnr.it'^('. 

[0494] Sometimes there are complex in.ioractions betwc(^n \;ymcs c'UalyzIng successive reactions in a pathway. 
5 Sometimes the presence o\ one enzyme cnn adversely aflor;'. i^o ac'iviiios of others in the pathway. This can be the 
result of protein-protein interactions, or inhibition of one enzyr^v^- hy the product of another, or an imbalance of primary 
or secondary metabolism. 

[0495] This problem is overcome by DNA shuffling, which producer solutions in the target gene cluster that bring 
about improvements in whatever trait is screened. An attcrnnliv ^ apDrcnch, which can solve not only this problem, but 
10 also anticipated future rate limiting steps such as supply of reducing power and substrate transportation, is complemen- 
tation by overexpression of other as yet unknown genomjc scJv nncQs. 

[0496] A library of Rtiodococcus genomic ONA in a muliicop ,' - 'hofinr-^ccus vector such as pRC1 is first made. This 
is transformed into Rhodococcus and trans formanis are rcrr 'or i' creases in the desired phenotype: Genomic 
fragments which result in increased pathway activity are evnb - ■! hy D^!^ shuffling to further increase their beneficial 

15 effect on a selected property. This approach requires no sequeri n 'nforr 'ion, nor any knowledge or assumptions about 
the nature of protein or pathway interaction:;, or even of the r:r v'ttr ; tep; it relies only on detection of the desired 
phenotype. This sort of random cloning anr! subsequent cvci r, hy ;a shuffling of positively interacting genomic 
sequences is extremely powerful and gcnr ric. A variety of so; - ? --f c> -omic DNA are used, from isogenic strains to 
more distantly related species with potcntiniiy desirable prop -; s. Ir^ t' * 'ion, the technique is, in principle, applicable 

20 to any microorganism for which the molecular biology basics ^ ' '^ao vo -^atton and cloning vectors are available, and 
for any property which can be assayed, preferably in a high-throi.'ghput Ir^rnnat. 

[0497] Homologous recombination within the chromosomo is "/-nd circumvent the limitations of plasmid-evolution 
and size restrictions, and is optionally used to alter centra! rv:!;^' :n!i^.^n. The strategy is similar to that described above 
for shuffling genes within their chromosomal conlrxt, ox cool t!i,-/ no ^'itro shuffling occurs. Instead, the parent strain 

25 is treated with mutagens such as ultraviolet linht or nitroso^)uan=':=r^o. ?^rid improved mutants are selected. The improved 
mutants are pooled and split. Half of the pool is usee togener-iic ' c^m - -^r-'^mic fragments for cloning into a homologous 
recombination vector. Additional genomic fraamnnts aro d.-riv- ' ' -nvrr -^'nd species with desirable properties (in this 
case higher metabolic rates and the ability 'o grow on chenr^' ■ " ■ ho^^ •purees). The cloned genomic fragments are 
homologously recombinod into the genomes of the romaininn h r f th-- '^utant pool, and variants with improved phe- 

30 notypes are selected. 1 hese are subjected !o a further rounr! • ■ • v -'aa ;sis; selection and recombination. Again this 
process is entirely generic for the improvement of any whol^- ; ocr- - yst for which a recombination vector and an 
assay can be developed. Recursive reconii^lnation can be pr 1 d tc -rease the diversity of the pool at any step in 
the process. 

[0498] Efficient homologous recombinaii'^n is important ■'■■■■^ reci!-"-/:ty of the chromosomal evolution strategies 
35 outlined above. Non-honnologous recombination rnsults in a it* lo in'^' ^tion (upon selection) followed by excision 
(following counterselection) of the entire plasrnid. Aliernativniv, if co^: v-selection were used, there is integration of 
more and more copies of plasrnid / genomic r-aqucncos whi ^ii V- u>^- ■,-ble and also requires an additional selectable 
marker for each cycle. Furthermore, additional non-homolo;)^!!'- occ^-f J nation will occur at random positions and may 
or may not lead to good expression of the in-r^nra.ind seouonc-- . 

40 

F. EXAMPLE 5: INCR EASING THE RATE C i- H ' : M O L O ( C " F. C ' ^ I NATION IN RHODOCOCCUS 

[0499] A genetic approach is used to increase the rate of i":^- r lou- ^combination in Rhodococcus. Both targeted 

and non-targeted slrategios to evolve increases in honnolooon" ^ ■ ^ lion are used. Rhodococcus recA is evolved 

^5 by DNA shuffling to increase its ability to pr-nioie homolonou; i rn! tion within the chromosome. The recA gene 

was chosen because there are variants of rocA knawn to ro^ • ; t incr --ed rates of homologous recombination in E 
coll. as discussed above. 

[0500] The recA gene from Rhodococci:.^^ iR DNA shuffled a. clone^ I'-in a plasmid that carries a selectable marker 

and a disrupted copy of the Rhodococcus h- oiolog of the S cnr- =:/ae ! ■ gene (a gene which also confers sensitivity 

50 to the uracil precursor analogue 5-fluoroorotic acid). Momolot] ":iis loerp ' • a of the plasmid into the chromosome disrupts 
thehosturacilsynthesis pathway leading to a '^irai'". :aat cnmr- 'hr ^-loc' lo marker and is also resistant to 5-fluoroorotic 
acid. The shuffled recA genes is integrated, and can be ann; ■:^ r^-m th chromosome, shuffled again and cloned back 
into the integration-selection vector. At each cyc^?, the roc A as pr;v noting the greatest degree of homologous re- 
combination are those that are the best rcf^reseried as into;- nts in ihe genome. Thus a Rhodococcus recA with 

55 enhanced homologous recombination-promo* inn r\r.;;vity is o"-' 

[0501] Many other genes are involved in sevorai diftereni ^ !oa t recombination pathways, and mutations in 

some of these proteins may also lead to cr;ns with an incr ^ . ; lev homologous recombination. For example 

mutations in E co// DN'V polymerase 111 havr* rcconiiy been < :o h- ase RecA-independent homologous recom- 
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bination. Resistance to DNA cross-linkior; ;igcnts such as pij -t:js acK^ mitomycin and ultraviolet are dependent on 
homologous recombination. Thus, increases in the activity of :i .i : ! athwj^y result in increased resistance to these agents. 
Rhodococcus cells are mutagenized and selected for increapod 'olerarce to DNA cross-linking agents. These mutants 
are tested for the rate at which a plasmid will integrate homologousiy inio the chromosome. Genomic libraries are 
5 prepared from these niutants, combined as described abov^\ rnd used to evolve a strain with even higher levels of 
homologous recombination. 

[0502] The foregoing description of the prc:fcrrcri cmbodifTiO! 's of the present invention has been presented for pur- 
poses of illustration and description. They arc no; intended lo [y exhaustive or to limit the invention to the precise form 
disclosed, and many modincations and variaticr^^- arc possih^n -i W^w - ' iiie above teaching. Such modifications and 
10 variations which may be apparent to a person skiiied in the a; l a^ g inlc; ;^; k! to be within the scope of this invention. All 
patent documents and publications cited above nro incorporMi-- ; by r-'- rence in their entirety for all purposes to the 
same extent as if each item were so individually denoted. 
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[0503] 



71 



EP 1 707 G i l A2 



SEQUENCE LISTING 

<110> MAXYGEN, INC. 

<120> EVOLUTION OF WHOLE CELLS AND ORGANISMS BY RECURSIVE 
SEQUENCE RECOMBINATION 

<130> P042372EP 

<140> EP 05077606.1 
<141> 1999-07-15 

<150> 09/354, 922 
<151> 1999-07-15 

<160> 15 

<170> PatentIn Ver. 2.0 

<210> 1 

<211> 330 

<212> PRT 

<213> Paralichthys olivacevjs 
<220> 

<223> Interferon 

<400> 1 

Met lie Arg Ser Thr Asn Scr Asn Lys Scr Asp lie Leu Met Asn Cys 
1 5 10 15 

His His Leu lie lie Arg Tyr Asp Asp Asn Ser Ala Pro Ser Gly Gly 
20 25 30 

Ser Leu Phe Arg Lys Met lie Met Leu Leu Lys Leu Leu Lys Leu He 
35 40 45 

Thr Phe Gly Gin Leu Arg v..;! Val Glu Leu Phe Val Lys Ser Asn Thr 

50 j5 60 

Ser Lys Thr Ser Thr Val Leu Ser He Asp Gly Ser Asn Leu He Ser 
65 70 75 80 

Leu Leu Asp Ala Pro Lys Asp He Lou Asp Lys Pro Ser Cys Asn Ser 
05 90 95 

Phe Gin Leu Asp Leu Leu Leu Ala Ser Ser Ala Trp Thr Leu Leu Thr 
100 105 110 

Ala Arg Leu Leu Asn Tyr Pro Tyr Pro Ala Val Leu Leu Ser Ala Gly 
115 120 125 

Val Ala Ser Val Val Leu Vnl Gin Val Pro 

130 r<:,i 



<210> 2 
<211> 1485 
<212> DNA 

<213> Escherichia col.i 
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<400> 2 

gggattt tgg 

tcggcacggt 

cttcagcggc 

gtggcaacaa 

cagaacatat 

aaaacaaaca 

gctccatcat 

cgctt tcact 

tctacggacc 

agcgtgaagg 

cacgtaaact 

aggcactgga 

actccgtggc 

tgggccttgc 

agtccaacac 

gtaacccgga 

acatccgtcg 

tgaaagtggt 

acggcgaagg 

tcgagaaagc 

atgcgactgc 

gtgagttgct 

gcgtagcaga 

gcggccct tt 

gcctggtagg 



tcatgagatit 
ctggtttgct 
gaccgtgatg 
tttctacaaa 
tgactatccg 
gaaagcgttg 
gcgcctgggt 
ggatatcgcg 
ggaatcttcc 
taaaacctgt 
gggcgtcgat 
aatctgtgac 
ggcactgacg 
ggcacgtatg 
gcugctgatc 
aaccaccacc 
tatcggcgcg 
gaagaacaaa 
tatcaacttc 
aggcgogtgg 
c tgyctgaaa 
gcUgagcaac 
aaclaacgaa 
tgctttittta 
ccattttttg 



atcnnnnngc 
tttgccactg 
cggtgctjticg 
acact Cgata 
gtattacccg 
gcggcagcac 
gaagaccgt t 
cc tgggycag 
ggtaaaacca 
gcgtttatcg 
a tcgacaacc 
gccctggcgc 
ccgaaagcgg 
atgagccagg 
ttca tcaacc 
ggtggtaacg 
g t g a a r:) r. ri q g 
atcgc::.::cgc 
tacggc::inac 
tac.'J g c Liica 
ga taacccgg 
ccgaactcaa 
gat: t: t:i:t:aat 
cv;LLgt. fi.-Kjg 
ga tcL LCcicc 



ggccgcggcc 
cccgcggtga 
tcaggctiact: 
ctgta tgogc 
gcatgacaug 
tgggccagat 
ccat gga t gt 
gtggLcKocc 
cgctgacgct 
atgctgaaca 
tgctgtgctc 
gttciiggcgc 
aaatcgaagg 
cgatgcgtaa 
agatccgta t 
cgctgaaat t 
gcgaaa-'jcat 
cgt t cn--^oca 
tggttc:.ir:cr. 
aaggtg^::: 
aaaccgcij 
cgccgrj/u: 
eg ten 
ataiirp;,:.; . 
tagaticci. 



taagaggcca 
aggcattacc 
gcgtatgcat 
a tacagtata 
;]g taaaaatg 
tgagaaacaa 
ggaaaccatc 
ga tgggccgt 
gcaggtgatc 
cgcgctggac 
ccagccggac 
aytagacgtt 
cgaaatcggc 
gcrtggcgggt 
gaaaattggt 
ct:acgcctct 
gc<t:rjggtagc 
cjr;ct,gaattc 
(j<;gcgtaaaa 
;: tcggtcag 
c:;;agatcgag 
ci- ctgtagat 
(jT: Lacacaag 
£• uagaatcaa 
tc.aat 



gagaagcctg 
cggcgggatg 
tgcagacctt 
attgcttcaa 
gctatcgacg 
tttggtaaag 
tctaccggtt 
atcgtcgaaa 
gccgcagcgc 
ccaatctacg 
accggcgagc 
atcgtcgttg 
gactctcaca 
aacctgaagc 
gtgatgttcg 
gttcgtctcg 
gaaacccgcg 
cagatcctct 
gagaagctga 
ggtaaagcga 
aagaaagtac 
gatagcgaag 
ggtcgcatct 
catcccgtcg 



60 

120 

180 

240 

300 

360 

420 

480 

540 

600 

660 

720 

780 

840 

900 

960 

1020 

1080 

1140 

1200 

1260 

1320 

1380 

1440 

1485 



<210> 3 
<211> 1382 
<212> DNA 

<213> Escherichia coli 



<400> 3 

tgttggcacg 

atgctccaac 

cttgtggcaa 

caacagaaca 

acgaaaacaa 

aaggctccat 

gttcgctt tc 

aaatctacgg 

cgcagcgtga 

acgcacgcaa 

agcaggcact 

ttgactccgt 

acatgggcct 

agcagtccaa 

tcggtaaccc 

tcgacatccg 

gcgtgaaagt 

tctacggcga 

tgatcgagaa 

cgaacgcgac 

tacgtgagi: t 

aaggcguagc 

tctgcgaccc 

tc 



gt ctggct tg 


ct:ttUr;-:cor: 


tgcccg;:.;-.:l 




;K:-:catta 


cccggcggga 


60 


ggcgaccg tg 




tcgtca ': ;. 




::^-.C5tatg 


cattgcagac 


120 


caat t ticUac 


gaaoc: g 


atactic : a g 




i:.-. r.acagt 


ataattgctt 


180 


tattgactat 


ccggt: n i\.::c 


ccggcr : ; -;C 




gfi=;jv:gaaa 


atggctattg 


240 


acagaoagcg 


tl:gr;cg.ic.-g 


cactig-; : : 




t cgagaaa 


caatttggta 


300 


ca tgcgcctg 


g:jUc:a:iG.=!CC 


gt tec.-- ■ ; ;,i 


t ■; 


tggaaacc 


atctctaccg 


360 


actgcc'! la tc 


c: i^g ';cg 


cagg*.:'* ' ■ . e: 


c; " 


cqatgggc 


cgtatcgtcg 


420 


a cog gaa tcl: 




CCaC;l ' :C 


Cj ; 


:gcaggtg 


atcgccgcag 


480 


aggta£iaacc 




tcgaL^j::v.. .ja 


a . 


acgcgctg 


gacccaatct 


540 


actgggcgtc 


ga t a tcgaca 


acctgcicg tg 


c: V. 


cccagccg 


gacaccggcg 


600 


ggaaat ctgt 


gacgccctgg 


cgcgttctgg 


Cf ; 


cagtagac 


gttatcgtcg 


660 


ggcggcactg 


acgccgaaag 


cggaaaccqa 


aggcgaaatc 


ggcgactctc 


720 


tgcggcacgt 


atgatgagcc 


aggcga tgcg 


ca 


ngctggcg 


ggtaacctga 


780 


cacgctgctg 


atcttcatta 


accagatccg 


ta 


tgaaaatt 


ggtgtgatgt 


840 


ggaaaccact 


accggtggta 


acgcgctgaa 


at tctacgcc 


tccgttcgtc 


900 


tcgtatcggc 


gcggt.gaaag 


agggcgf^.j/ia 




itggtgggt 


agcgaaaccc 


960 


gg tgaagaac 




cgccgt: .-ri 




i^Kr'^ctgaa 


ttccaggtcc 


1020 


aggtatcaac 


tuctac: :';r;c;q 


aactgg; 


(:■■ 


:;: vi-i^cgta 


aaagagaagc 


1080 


agcaggcgcg 


tggtfKN^rjcu 


acaaac; it) in 




.loattggt 


cagggtaaag 


1140 


tgcctggctg 


a rsc] on v. '?'--. V. c 


cggaa.n-:;:: ■.: 


( ■ 


■'.r,';:,3gatt 


gagaagaaag 


1200 


gctgc^^gagc 


a a c c c: u 


caacgc ■■ : • 




'/.c; r.ctgga 


gatgatagcg 


1260 


agaaa ctaac 




aatcg ' ■. 


\. '■ 


•.■.r:;icacac 


aagggtcgca 


1320 


ttttgcUttt 


t: :] g \; ■:. r ; n 


agga t/- ■. 




"'^gaat 


caacatcccg 


1380 














1382 



<210> 4 
<211> 1430 
<212> DNA 
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<213> Escherichia coli 



<400> A 

agaggccaga 

gcattactcg 

cgtatgcatt 

tacagtataa 

gtaaacatgg 

gagaaacaa t 

gaaaccatct 

atgggccgta 

caggtgatcg 

gcgctggacc 

cagcccgaca 

gtagacgtta 

gaaatcggcg 

ctggcgggta 

aaaattggtg 

tacgcctctg 

gtgggtagcg 

gctgaattcc 

ggcgtaaaag 

atcggtcagg 

gagatcgaga 

tctgtagatg 

atacacaagg 

cagaatcaac 



gaagcctgtc 
gcgggoo tgc 
gcagaccttg 
ttgct tcaac 
ctatcgacga 
t tgg t aaagcj 
ccaccggttc 
tcgtcgaaat 
ccgcagcgca 
caatctacgc 
ccggcgagca 
tcgtcgttga 
actctcncat 
acctgaagca 
tgatgttcgg 
ttcgtctcga 
aaacccqcgt 
aaa t:ccLcta 
agaagccgat: 
gtaoagcgaa 
agaaay tacg 
atagcgaagg 
gtcgcotictg 
at cccgtcgg 



ggcacqtjl; ct 
ttcagU;:jcq 
tcQcaiiCi.-i 'c 
agnacaun tt 
aaacaaacag 
ctcca t c.'itg 
get ttcactg. 
ct acggaccg 
gcgtgaagot 
acgtaaactg 
ggcactggaa 
ctccgtggcg 
gggcct tgcq 
gtccaocacg 
taacccggria 
caUccg tcgt 
ga/iagt:ri:; ;.g 
ego cga^ir;--; L 
eg -! gaarjrjca 
tgcgaci. gee 
tgag t tg^i^ig 
cgt agcr^ :'qa 
cggccct: 1: 1 1 
cctgg t aggc 



ggtttgc:c * t 
accgtg/! 
ttctac^i ■;."!:■■ 
gactatf:.:*;g 
aaagcgt i..*jq 
cgcctggg*:. g 
gatatcgcnc 
gaatct t.cc g 
aaaacct.gt: g 
ggcgtcnn t; a 
atctgtgacg 
gcactgacgc 
gcacgtatga 
ctgctga t.'.Tt 
accacta<:cg 
atcggcg':gf; 
aagaac/.ii;'^ m 
atcaac. . . 
ggcgcgr -; ; 
Lggct:g/i i ■ g 
etgagt.-.i-f:--: 
actaacg- -v: 
gctttt t 
catttti: :: .: 



tgccactgcc 
gg vgcgtcgt 
CP. ;:cLgatac 
ta -., tacccgg 
cggcagcact 
aagnccgttc 
ttcgggcagg 
gtaaaaccac 
cgtttatcga 
tcgacaacct 
ccctggcgcg 
egaoagcgga 
tgagccaggc 
tea t:caacca 
gtggtaacgc 
tignaagaggg 



I' 



a : 

C'. 

a L 
gut 

a I . !. 



•:cv.gcgcc 
■.■■y ;:aact. 

ac a a 
jracccgga 
■ act,eaac 
i.LtLaatc 
-'5ggga 
'ccacct 



cgcggtgaag 
caggctactg 
tgtatgagca 
catgacagga 
gggccaga tt 
catggatgtg 
tggtctgccg 
gctgacgctg 
tgctgaacac 
gctgtgctcc 
ttctggcgcg 
aatcgaaggc 
gatgcgtaag 
gatccgtatg 
gctgaaattc 
cgaaaacgtg 
gtttaaacag 
ggttgacctg 
aggtgagaag 
aaccgcgaaa 
gccggatttc 
gtcttgtttg 
tatgccatga 



60 

120 

180 

240 

300 

360 

420 

480 

540 

600 

660 

720 

780 

840 

900 

960 

1020 

1080 

1140 

1200 

1260 

1320 

1380 

1430 



<210> 5 
<211> 1380 
<212> DNA 

<213> Esche.richia coli 



<400> 5 

cggcagggtc 
cttcagcggc 
gtggcaacaa 
cagaacatat 
agaacaaaca 
gctccatea t: 
cgctttcact 
tctacggacc 
agcgtgaagg 
gcacgtaaac 
caggcactgg 
gactccgtag 
atgggccttg 
ttgtccaaca 
ggtaacccgg 
gacatccgtc 
gtgaaagtgg 
tacggcgaag 
atcgagaaag 
aatgcggct g 
cgtgagttgc 
ggcgtagcag 
tgcggcccL t: 



tggtttgctt 


ttgceaetige 


e eg egg; 




' a tec 


ggcgggaatg 


60 


ggecgr.gatg 


egg i:gcg:: g 


teaggcv.. : 




gcat 


tgcagacctt 


120 


t U ueta caaa 


ac';cct: g -1 = a 


etg t a t c; i ; : 




Lata 


attgcttcga 


180 


tgactaiccg 


g !■ ■ ': t:a;:' .:g 


gcat ga:. ■ ; 




; -atg 


gctatcgacg 


240 


gaangeg f.tg 


g egg cog :ne 


tggyce--' : >. 




■:acaa 


tttggtaaag 


300 


gcgcc!:gggt: 


gaagacc; ;:t: 


ecatggr -g: 


g: 


; catc 


tctaccggtt 


360 


gga t:a Leg eg 


cttgggr:';.-jg 


gtggtci 


gai :;g 


gjcgt 


atcgtcgaaa 


420 


ggaa i:ctt:cc 


ggV.aaaacca 


eactgac:.. 




*. gate 


gccgcagcgc 


480 


t aaaacet g t 


tgcgLt Lai.c 


gatgctg.> r..: 






cccaatctac 


540 


tgggcgtcga 


ta tcgacaac 




ceccig 


cegga 


caccggcgag 


600 


aaa t ctgtga 


cgcectggcg 


cgttct gg:'r:g 


eag*;;^ 


gacgt 


tatcgtcgtt 


660 


cggcactgac 


gccgaaagcg 


gaaatcgaag 


gcgoa 


^itcgg 


cgactctcac 


720 


cggcacgtat 


gatgagccag 


gcgatgcgta 


agctggcggg 


taacctgaag 


780 


egctgctga t 


ct 1 1 atcaac 


cagatccg ta 


t gaari 


jttgg 


cgtgatgttc 


840 


aaaecaccac 


cggtgg taac 


gcgctga-ia t 


t e t: a c 


gcctc 


tgttcgtctc 


900 


gtatcggtgc 


ggtgaaagag 


ggcgaaa-K'.:g 




•;gtag 


cgaaacccgc 


960 


t g a a g a a c a a 


a a : ogctg'*:g 


ccgtttn -• 'K': 




■latt 


ccagatcctc 


1020 


gt. a tcaact t 


ct..neggegr:,-i 


ct gg t l:g-. ■: 




- ':aaa 


agagaagctg 


1080 


caggcgcgtg 


g-K;figr::. ,c 


aaaggtg ■ ■i 


a-. ■ ■ 


: tea 


gggtaaagcg 


1140 


cetggcl'gaa 


ag :l:aa!;r;-ci 


g.^f'if3ecf; . -i 




~ cga 


gaagaaagta 


1200 


tgei:gagcaa 


cc-; ':aa -':-. \n 


acgccgc, ' 




~ :aga 


tgatagcgaa 


1260 


on. 'let a a eg a 


aga 1 1 : '. -^a 


teg Let t • t 


t. -'..r- 


::Tcga 


gggtcgcatc 


1320 


ttgc:ttttt L 


aag:. tigii- n g 


gatatge ; u 


ga; 


ra tea 


acatccagtc 


1380 



<210> 6 
<211> 130 
<212> DNA 
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<213> Escherichia coXi 



<400> 6 



5 



10 



15 



20 



agaggccaga 


gaaqccagtt 


ggcricqqt.ct 


ggt t tgC. 


!-t: 


tgcc^i 


cr.gcc 


cggggtgagg 


60 


gcattacccg 


gcgggaatgc 


ttcp.gcqr:cg 


accgtga:.: 






f^:tcgt 


caggctactg 


120 


cgtatgcact 


gcagacctcg 


tgq<:aacr;,-j t 


t tctaca.-i; 


T a 






tgtatgagca 


180 


tgcagtataa 


ttgct tcaac 


agsacar.of. t 


gactat cc: 




tar.t-..- 


c: ccgg 


catgacagga 


240 


gtaaaaatgg 


ctat tgacga 


aaacaaacag 


aaagcg tl; 




cggca 


qcact 


gggccagatt 


300 


gagaaacaat 


ttggtaaagg 


ctccatcatg 


cgcctgq^: 


■a 


aagac; 


cqttc 


catggatgtg 


360 


gaaaccatct 


ctactggttc 


gctttcactg 


gatatcgci 




ttggg 


ocagg 


tggtctgccg 


420 


atgggccgta 


tcgtcgaaat 


ctatiggaccg 


gaatct tccg 


gtaaa 


;,;ccac 


actgacgctg 


4 8.0 


caggtgatcg 


ccgcagcgca 


gcgugaggqt 


aaaacctqtq 


cot t la toga 


tgctgaacac 


540 


gcgctggacc 


caatctacgc 


acgtioaactq 


ggcgt cga 


v.a 


tcqa c 


aacct 


gctgtgctcc 


600 


cagccggaca 


ccggcgagca 


ggcacCgg-ia 


acctgtc.. 




cccl: (] 


:icgcg 


ttctggcgct 


660 


gtagacgtta 


tcgtcgttga 


ct ccqtgocq 


gcactgtr. 




cc: n n a 


ocgga 


aatcgaaggc 


720 


gaaatcggcg 


act:ctcncat 


gggcctr.ccq 


gcacgta:: 


':a 


tc;-oc 


c;jqgc 


aatgcgtaag 


780 


ctggcgggua 


acclgaagca 


gtccaacacq 


ctgctga:. 


ct 


teat c 


/>acca 


gatccgtatg 


840 


aaaattggtg 


tgaUgttcgg 


taacccqo.^ a 


accaccac: 


eg 


g*-^r-- 


Twicgc 


gctgaaattc 


900 


tacgcctcUg 


ttcgtctcga 


catccgiicqt; 


atcqgcqr: 




tg;w:o 


r- jggg 


cgaaaacgtg 


960 


gtgggtagcg 


aaacccgcgt 


gnaaqtrj.;; ':q 


aaqaaca- 


^■a 


tc..,r;;. 


g.:gcc 


gtttaaacag 


1020 


gctgaattcc 


agatcctcta 


cqqcqa.iq ''jt: 


atcaact:' 


':t 


ac;v;.: 


G'iact 


ggttgatctg 


1080 


ggcgtaaaag 


agaaqctgat 


cgaqaaa::ca 


ggcgcglq 


ot 


ac;^':c; 


I'.jcaa 


aggtgagaag 


1140 


gttggtcaqg 


gtaaagcgaa 


tqcqcJCLfiCC 


tggctga/! 


;-q 


at.;-;;; 


•v-cqga 


aaccgcgaaa 


1200 


gagatcgacja 


agaariqtacg 


t g a (. J t: t q c L q 


ctgagca-^ 




ccna;. 


■ ::aac 


gccggatttc 


1260 


tctgtagaug 


otagcqaagq 


cyuaqcagaa 


actaaccj'. 




a L : t 


V -^atc 


stcttgtttg 


1320 


atacacaagg 


gccgcaUctg 


cgq 












1343 



25 

<210> 7 
<211> 1379 
<212> ONA 

<213> Escheri cilia coli 



<400> 7 



40 



50 



gaggccagag 


aaqcccrjtcg 


gcttqqi.c 


i:q 


q*c ttgct;.' 


'c 


acc: 




:r;ccc 


gcggtgaagg 


60 


cattacccgg 


cqqqaa tiqct 


tcaqcq;:^ 




ccqtgat-- 


"9 


gi- 




.qtc 


aggctactgt 


120 


gtatgcactq 


ca:j.)c;f:t tgt 


ggc-iac:- 


t: 


tctacao 


c 


ac 




i acc 


gtatgagcac 


180 


acagtataat 


cc c t. \. cgaca 


gnactit^v. 


■ g 


acta tcc' 


;t 


at ' 




:qgc 


atgacaggag 


240 


taaaaatggc 


t a t t.(;.u:gaa 


a a C'~'r:a<'. ■ 


';a 


aaqcqtt- 


.c 


q c . 




^ ,ctg 


ggccagattg 


300 


agaaacagu t 


t qqf.anaggc 


tcc.^ t CO : 




gcctggqr. 


: i 


a ;-: 


■ c. c 


- tcc 


atggatgtgg 


360 


aaaccatccc 


'caccqqttcg 


ctttcacr 


^|g 


atatcgc ■ 




. . 


ig ■ 


■ggt 


ggtctgccga 


420 


tgggccqtat 


cqtcq aaatc 


tacqqac cgq 


aatcttc 


■ .i 


ta^- 


:ar:S 


:acg 


ctgacgctgc 


480 


aggtgatcqc 


cqcaqcgcag 


cgtgaaggta 


aaacctgt ! 


■jC 


gttta; 


:cqat 


gctgaacacg 


540 


cgctggaccc 


gatcliacqca 


cqtaaact gg 


gcgtcga t. 




cgrT- 


■ ca.; 


icctg 


ctgtgctccc 


600 


agccggacac 


cggcgagcag 


gcactqgaaa 


tctgtgacqc 


cctqq: 


:gcgc 


tctggcgcag 


660 


tggacgttat 


cgtcqt tgac 


tccqtqgc 




cactgacr. 




q <y r, 


■ aqr 


:qqaa 


atcgaaggcg 


720 


aaatcggcga 


ctct cacatg 


gc{:cttqcaq 


cacgtatq. 




Oil- 




ggcg 


atgcgtaagc 


780 


tggcgggtaa 


cct qaagcag 


tccaacacoc 


tqctgat c-: 




en : 


.r:(M 


iccag 


atccgtatga 


840 


aaattggtqt 


qatq ttcggt 


aacccgga 


a a 


ccactacc 


ng 


tc' 


i t Ifi ' 


;cqcg 


ctgaaattct 


900 


acgcctctqt 


t cqtctcqac 


atccqtcq 


t a 


tcggcac': 


r.:t 


ga.; 


: t": 'V ■} 


^:^ggc 


gaaaacgtgg 


960 


tgggtagcqa 


aaccccjcqtg 


aaaqtqq;. 


qa 


oqaacaa-- 


■ t 


C". ; 




ccg 


tttaaacagg 


1020 


ctgaattcca 




gacqaa 


a 


tcaa ct t 


• a 


c: 




c tg 


gttgacatgg 


1080 


gcgtaaaaqa 


r^^wifjctqatc 


gacAaac 




gcqcqtq . 


.a 






■ ' aaa 


ggtgagaagg 


1140 


ccggtcagqg 


■•.aaaq cqaat 






g<jctqaa 


:a 


ta 




: qaa 


accgcgaaag 


1200 


agatcgagaa 


c.i aacjiiacgt 


gaqv. i;qc:: 


■■|C 


tyagcaa: 


,C 


q--: 




^acg 


ccggatttct 


1260 


ctgtagatqa 


V.aqcqaaggc 


gt.-qcaq-- 


■ ■ a 


ct aacga ; 








acg 


tcttgtttga 


1320 


tacacaagqg 


t cgc;^ uctgc 


gqccct I. 


■ g 


ctttttt- 


.) 


t ■ 


: t ■ 


-;iat 


atgccatga 


1379 



<210> 8 
<211> 358 
<212> PRT 

<213> Ksche.r.i chia coli 
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<400> 0 

Met Thr Gly Val Lys Met AIp: Tie Asp G.li; Asn l.y.s Gin Lys Ala Leu 
1 5 1-: 15 

Ala Aln Ala Leu Gly Gin Tie r,lu Lys Gin Phe C] y l.ys Gly Ser He 
20 25 30 

Met Arq Leu Gly Glu Asp Arg Ser Met Asp Val Glu Thr He Ser Thr 
35 40 45 

Gly Ser Leu Ser Leu Asp He Ala Leu Gly Ala Gly Gly Leu Pro Met 

50 55 60 

Gly Arq He Val Glu He Tyr Gly Pro Glu Ser Ser Gly Lys Thr Thr 
65 70 75 80 

Leu Thr Leu Gin Val He Al/,i Ala A.I a Gl.- Arg Glu Gly Lys Thr Cys 
85 95 

Ala Phe He A;;n Ala G]m H.i.r? Ala Leu Asf^ Pro T:o Tyr Ala Arg Lys 
J 00 105 110 

Leu Gly Val Asp He Asp Asn Leu Leu Cys "er Gin Pro Asp Thr Gly 

115 120 :25 

Glu Gin Ala Leu Glu He Cys Asp Ala Leu Ala Arg Ser Gly Ala Val 

130 135 ) V) 

Asp Val He Val Val Asp Ser Val Ala Al;; Leu Thr Pro Lys Ala Glu 
145 150 155 160 

He Glu Gly Glu He Gly Asp Ser His Met Gly Leu Ala Ala Arg Met 
165 11- 175 

Met Ser Gin Ala Met Aro Lys Leu Ala Gly Asn Tv?v] Lys Gin Ser Asn 
mo 105 190 

Thr Leu Leu Lie Phe He Asn Gin 13 e Arr ^'.et Lys Tie Gly Val Met 

1"5 200 : 05 

Phe Gly Asn Pro Glu Thr Thr Thr Gly Gly Asn Ala Leu Lys Phe Tyr 

210 215 220 

Ala Ser Val Arg Leu Asp He Arg Arg He Gly Ala Val Lys Glu Gly 
225 230 235 240 

Glu Asn Val Val Gly Ser Glu Thr Arg Val Lys Val Val Lys Asn Lys 

245 ■ 250 255 

He Ala Ala Pro Phe Lys Gin Alo Glu Pho Gin He Leu Tyr Gly Glu 

2r0 2S5 270 

Gly He Asn Vho Tyr Gly Glu J,eu Vnl As;; Leu (My Val Lys Glu Lys 

2/5 280 ^B5 

Leu He Giu Lys Ala Gly Ala Trp Tyr Ser Tyr Lys Gly Glu Lys He 

290 295 ?.0C 

Gly Gin Gly Lys Ala Asn Ala Thr Ala Trp Leu Lys Asp Asn Pro Glu 
305 310 315 320 
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Thr Ala Lys G.lu lie Glu hys hys Val Arg Glu hcu Leu Leu Ser Asn 
325 330 335 

Pro Asn Sex Thr Pro Asp Phc Ser Val A:sp Asp Ser Glu Gly Val Ala 
310 3^15 * 350 

Glu Thr Asn Glu Asp Phe 
355 



<210> 9 
<211> 358 
<212> PRT 

<213> Escherichia coli 
<400> 9 

Met Thr Gly Val Lys Met hid lie Asp G'iu Asn j..ys Gin Lys Ala Leu 
1 5 :o 15 

Ala Thr Ala Lou Gly Gin lie Glu Lys Gin Phc; Gly Lys Gly Ser lie 
20 25 30 

Met Arg Leu Gly Glu Asp Arg Ser Met Asp Val Glu Thr lie Ser Thr 
35 4 0 4 5 

Gly Ser Leu Ser Leu Asp lie Ala Leu Gly Ala Gly Gly Leu Pro Met 
50 55 60 

Gly Arg lie Val Glu lie Tyr Gly Pro G^u Ser Ser Gly Lys Thr Thr 
65 70 75 80 

Leu Thr Leu Gin Val lie Ala Ala Ala G!n Arc Glu Gly Lys Thr Cys 
85 0 95 

Ala Phe lie Asp Ala Glu )l\ Ala Leu A:p Pro Tie Tyr Ala Arg Lys 
^30 105 110 

Leu Gly Val Asp He Asp Asn Leu Leu Cys Ser Gin Pro Asp Thr Gly 
115 120 125 

Glu Gin Ala Leu Glu He Cys Asp Ala Leu Ala Arg Ser Gly Ala Val 

130 135 I'^O 

Asp Val He Val Val Asp Ser Val Ala Ala Leu Thr Pro Lys Ala Glu 
145 150 155 160 

He Glu Gly Glu He Gly Asp Ser. Mis Mot Gly Leu Ala Ala Arg Met 
165 10 175 

Met Ser Gin Ala Met Arg Lys Leu Ala G';.y Asn Leu Lys Gin Ser Asn 
IBO 185 ■ 190 

Thr Leu Leu Tie Phe He Asn Gin He Arg Met Lys He Gly Val Met 

1D5 200 " 205 

Phe Gly Asn Pro Glu Thr Thr Thr Gly Gly Asn Ala Leu Lys Phe Tyr 

210 '215 2;:o 

Ala Ser Val Arg Leu Asp He Arg Arg He Gly Ala Val Lys Glu Gly 
225 230 235 240 
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10 



15 



20 



Glu Asn Val Val Gly Ser Glu Thr Arg Val Lys Val Val Lys Asn Lys 
245 250 255 

lie Ala Ala Pro Phe Lys Gin Ala Glu Phe Gin Val Leu Tyr Gly Glu 
260 265 270 

Gly lie Asn Phe Tyr' Gly Glu Leu Val Asp Leu Gly Val Lys Glu Lys 
275 280 285 

Leu lie Glu Lys Ala Gly Ala Trp Tyr Ser Tyr Lys Gly Glu Lys He 

290 295 300 

Gly Gin Gly Lys Ala Asn Ala Thr Ala Tro Leu Lys Asp Asn Pro Glu 
305 310 315 320 

Thr Ala Lys Glu He Glu Lys Lys Val Arq Glu Lgu Leu Leu Ser Asn 
325 3/0 335 

Pro Asn Ser Thr Pro Asp Phe Ser Gly Asp Asn Ser Glu Gly Val Ala 
340 345 350 

Glu Thr Asn Glu Asp Phe 

355 



25 <210> 10 

<211> 358 

<212> PRT 

<213> Escherjchia coli 

<400> 10 

Met Thr Gly Val Asn Met A.l.:i He Asp G!u Asn Lys Gin Lys Ala Leu 
1 5 iO 15 

Ala Ala Ala Leu Gly Gin IJ e Glu Lys Gin Cly Lys Gly Ser He 

20 25 30 

Met Arg Leu Gly Glu Asp Ar<: Ser Met Asp Val Glu Thr He Ser Thr 
35 40 45 

Gly Ser Leu Sor Leu Asp He Ala Leu Gly Ala Gly Gly Leu Pro Met 

50 55 ■■■.:) 

Gly Aro Tie Val Glu He Tyr Gly Pro Glu Ser Sor Gly Lys Thr Thr 
65 70 75 80 

Leu Thr Leu Gin Val He Ala Ala Ala Gin Arq Glu Gly Lys Thr Cys 
45 85 ;0 95 

Ala Phe He Asp Ala Glu His Ala Leu Asp Pro Me Tyr Ala Arg Lys 
100 105 110 



30 



35 



40 



50 



55 



Leu Gly Val Asp He Asp Asn Leu Leu Cys Ser G.l n Pro Asp Thr Gly 

115 120 125 

Glu Gin Ala Leu Glu He Cys Asp Ala Leu Ala Ary Ser Gly Ala Val 

130 135 ,: 

Asp Val He Val Val Asp Sor Val Ala Ala Le\) Thr Pro Lys Ala Glu 

145 150 li:;') 160 
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He Glu Gly Glu He Gly Asp Ser His Met Gly Leu Ala Ala Arg Met 

165 170 175 

Met Ser Gin Ala Met Arg Lys Leu Ala Gly Asn Leu Lys Gin Ser Asn 

IBO 1B5 190 

Thr Leu Leu He Phe He Asn Gin He Arg Met Lys He Gly Val Met 

195 200 205 

Phe Gly Asn Pro Glu Thr Thr Thr Gly Gly Asn Ala Leu Lys Phe Tyr 

210 215 220 

Ala Ser Val Arg Leu Asp He Arg Arg He Gly Ala Val Lys Glu Gly 

225 230 235 240 

Glu Asn Val Val Gly Ser Glu Thr Arg Val Lys Val Val Lys Asn Lys 

2^5 250 255 

He Ala Ala Pro Phe Lys Gin Ala Glu Phe Gin He Leu Tyr Gly Glu 

260 265 270 

Gly He Asn Phe Tyr Gly Glu Leu Val Asp Leu Gly Val Lys Glu Lys 

275 280 285 

Leu He Glu Lys Ala Gly Ala Trp Tyr Ser Tyr Lys Gly Glu Lys He 

2 90 2Q5 :)■:■: 

Gly Gin Gly Lys Ala Asn Ala Thr Ala Trp Leu Lvs Asp Asn Pro Glu 

305 310 315 320 

Thr Ala Lys Glu He Glu Lys Lys Val Ar(: Glu Leu Leu Leu Ser Asn 

325 330 335 

Pro Asn Ser Thr Pro Asp Phe Ser Val Asp Asp Ser; Glu Gly Val Ala 

3^.0 3^5 350 



Gly Thr Asn Glu Asp Phe 



<2io> n 

<2H> 
<212> pur 

<213> Escherichia coll 
<400> 11 

Met Thr Gly Val Lys Met AIn He Asp Gin Asn J.ys Gin Lys. Ala Leu 
1 5 10 15 

Ala Ala Ala Leu Gly Gin He Glu Lys Gin Phe f;:v !.ys Gly Ser He 
20 25 30 

Met Arg Leu Gly Glu Asp Arg Ser Met Asp Val Glu Thr He Ser Thr 

35 AO AS 

Gly Ser Leu Ser Leu Asp He Ala Leu Gly Ala Giy Gly Leu Pro Met 

• 50 5 5 

Gly Arc; Me Val Glu He Tyr Gly Pro Glu Ser S.-r Gly Lys Thr Thr 
65 70 75 80 
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Leu Thr Leu Gin Val He Ala Ala Ala Gin Arg Glu Gly Lys Thr Cys 
85 90 95 

Ala Phe He Asp Ala Glu His Ala Leu Asp Pro He Tyr Ala Arg Lys 
100 105 110 

Leu Gly Val Asp He Asp Asn Leu Leu Cys Ser Gin ;'ro Asp Thr Gly 
115 120 ' I2b 

Glu Gin Ala Leu Glu He Cys Asp Ala Leu Ala Arg Ser Gly Ala Val 
130 135 1^0 

Asp Val He Val Val Asp Ser Val Aln Alci Leu Thr Pro Lys Ala Glu 
145 150 155 160 

He Glu Gly Glu He Gly Asp Sor His Met Gly Lnu Ala Ala Arg Met 
165 170 175 

Met Ser Gin Ala Met Arg Lys Leu Ala Gly Asn Lou Lys Leu Ser Asn 
1«0 185 190 

Thr Leu }-eu He Phe He Asn Gin He Arg Met L^'S Tie Gly Val Met 

1 '-5 200 ;■ ":5 

Phe Gly Asn Pro Glu Thr Thr Thr Gly Gly Asn Al.o Leu Lys Phe Tyr 

210 2}5 ?.?.': 

Ala Ser Val Arg Lou Asp Ho Arg Arg Ho Gly A^n Val Lys Glu Gly 
225 230 235 240 

Glu Asn Val Val Gly Ser Glu Thr Arg Val Lys V.-1 Val Lys Asn Lys 
2*15 250 255 

He Ala Ala Pro Phe Lys Gin Ala Glu Phe Gin Ho Leu Tyr Gly Glu 

265 270 

Gly He Asn Phe Tyr Gly Glu Leu Val Asn Leu Gly "^il Lys Glu Lys 
2 /5 2R0 '5 

Leu He Glu Lys Ala Gly Ala Trp Tyr Sor Tyr Lvr^ Hly Glu Lys He 
290 2 95 J 

Gly Gin Gly Lys Ala Asn Ala Ala Ala Trp Leu Lvs Gly Asn Pro Glu 
305 310 315 320 

Thr Ala Lys Glu He Glu Lys Lys Val Arg Glu Leu Leu Leu Ser Asn 

325 330 335 

Pro Asn Ser Thr Pro Asp Phe Ser Arg Asp Asp Ser Glu Gly Val Ala 

2^.0 3^15 350 



Glu Thr Asn Glu Asp Phe 

3!- 5 



<210> 12 
<211> 358 
<212> PRT 

<213> Escherichia col i 
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<400> 12 

Met Thr Gly Val Lys Met Ala lie Asp Glu Asn Lys Gin Lys Ala Leu 
15 10 15 

Ala Ala Ala Leu Gly Gin lie Glu Lys Gin Phe Gly Lys Gly Ser He 
20 25 30 

Met Arg Leu Gly Glu Asp Arg Ser Met Asp Val Glu Thr He Ser Thr 
35 40 'IS 

Gly Ser Leu Ser Leu Asp He Ala Leu Gly Ala Gly Gly Leu Pro Met 

50 55 60 

Gly Arg He Val Glu He Tyr Gly Pro Glu Ser Scr Gly Lys Thr Thr 
65 70 75 80 

Leu Thr Leu Gin Val He Ala Ala Ala Gin Arg Glu Gly Lys Thr Cys 
85 90 95 

Ala Phe He Asp Ala Glu His Ala Leu Asp Pro He Tyr Ala Arg Lys 
100 105 110 

Leu Gly Val Asp He Asp Asn Leu Leu Cys Ser CIn Pro Asp Thr Gly 

r:5 120 125 

25 Glu Gin Ala Leu Glu He Cys Asp Ala Leu Ala Arq Ser Gly Ala Val 

130 135 

Asp Val He Val Val Asp Ser Vnl Ala Ala Leu Sor Pro Lys Ala Glu 
145 150 155 160 



10 



15 



20 



30 



35 



40 



45 



50 



55 



He Glu Gly Glu He Gly Asp Ser His Met Gly Leu Ala Ala Arg Met 
165 1-/0 175 

Met Ser Gin Ala Met Arg Lys Leu Ala Gly Asn Leu I^ys Gin Ser Asn 
If^O 185 190 

Thr Leu Leu He Phe He Asn Gin He Arg Met Lys Tie Gly Val Met 

•vS 200 :^05 

Phe Gly Asn Pro Glu Thr Thr Thr Gly Gly Asn Ala Leu Lys Phe Tyr 

210 2i:3 

Ala Ser Val Arg Leu Asp He Arg Arg Ho Gly Ala Val Lys Glu Gly 
225 230 235 240 

Glu Asn Val Val Gly Ser Glu Thr Arg Val Lys Val Val Lys Asn Lys 
245 250 255 

He Ala Ala Pro Phe Lys Gin Ala Glu Phe Gin He Leu Tyr Gly Glu 

260 265 270 

Gly He Asn Phe Tyr Gly Glu Leu Val Asp Leu Gly Val Lys Glu Lys 

275 280 285 

Leu He Glu Lys Ala Gly Ala Trp Tyr Ser Tyr Lys Gly Glu Lys Val 

290 295 :>00 

Gly Gin Gly Lys Ala Asn Ala Thr Ala Trp Leu l.yy Asp Asn Pro Glu 
305 310 315 320 
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10 



25 



40 



45 



50 



Thr Ala Lys G.] u lie Glu Lys Lys VpiI Arg Glu Leu Leu Leu Ser Asn 
325 330 335 

Pro Asn Ser Thr Pro Asp Phe Ser Val Asp Asp Stir Glu Gly Val Ala 
310 3^5 350 

Glu Thr Asn Glu Asp Phe 
355 



<210> 13 
<211> 358 
<212> PRT 

<213> Escherichia coli 

15 

<400> 13 

Met Thr Gly V/-1 Lys Met Ala lie Ar.p Glu Asn ],ys Gin Lys Ala Leu 
1 5 * 10 * 15 

Ala Ala Ala Leu Gly Gin lie Glu Lys G.l n Phe Gly Lys Gly Ser lie 
20 20 25 30 

Met Arg Leu Gly Glu Asp Arcj Ser Met Asp Val GL'.j Thr lie Ser Thr 
35 40 AS 



Gly Ser Leu Ser Leu Asp lie Ala Leu Gly Ala Gly Gly Leu Pro Met 

50 55 60 

Gly Arg lie V<-b1 Glu lie Tyr Gly Pro G.1u Ser Ser Gly Lys Thr Thr 
65 70 75 80 

30 Leu Thr Leu Gin Val lie Aln Ala Ala Gin Arg Glu Gly Lys Thr Cys 

05 90 95 

Ala Phe lie Asp Ala Glu Uxs Ala Leu Asp Pro Tie Tyr Ala Arg Lys 
100 105 110 

Leu Gly Val Asp lie Asp Asn Leu Leu Cys Ser Gin Pro Asp Thr Gly 

115 120 ,125 

Glu Gin Ala Leu Glu He Cy.'i Asp Ala Lou Ala Arq Ser Gly Ala Val 
130 135 } 

Asp Val He Vol Val Asp Ser Val Ala Ala Leu Thr Pro Lys Ala Glu 
145 150 155 ' 160 

He Glu Gly Glu He Gly Asp Ser His Met Gly Leu Ala Ala Arg Met 
165 .170 175 

Met Ser Gin Ala Met Arg Lys Leu Ala Gly Asn Leu Lys Gin Ser Asn 

.1 iiO in5 ■ 190 

Thr Leu Leu He Phe He Asn Gin He Arg Met Lys Tie Gly Val Met 

195 200 /05 

Phe Gly Asn Pro Glu Thr Thr Thr Gly Gly Asn Ala Leu Lys Phe Tyr 

210 215 

Ala Ser Val Arg Leu Asp He Arg Arg He Gly Thr Val Lys Glu Gly 
55 225 230 235 240 



82 



EP 1 707 641 A2 



Glu Asn Va.l Val Gly Ser Gl.u Thr Ary Vol l.ys Vol Val "Lys Asn Lys 
215 :-!50 255 

lie Ala Ala Pro Phe Lys Gin Ala Glu Phc Gin lie !.eu Tyr Asp Glu 

260 265 270 

Gly lie Asn Phe Tyr Gly Glu T.c^u Val Asp Met Gly Lys Glu Lys 

275 230 



10 



Leu lie Glu Lys Ala Gly Ala Trp Tyr Ser Tyr Lys Gly Glu Lys Ala 
290 295 3C0 



Gly Gin Gly Lys Ala Asn Ala Thr Ala Trp Leu Lys Asp Asn Pro Glu 
305 310 Jib 320 



15 



Thr Ala Lys Glu lie Glu Lys Lys Val Arq Glu Leu Leu Leu Ser Asn 

325 33;) 335 



20 



Pro Asn Ser Thr Pro Asp Phe Ser Val Asp A.sp Ser Glu Gly Val Ala 
340 345 350 

Glu Thr Asn Glu Asp Phe 

355 



<210> 14 
<211> 1398 



30 



40 



50 



<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of ArtificMal Sequcr.':o: cons'-^nsiifi 
e, coli sequence 



<400> 14 

agaggccaga gaagcctgtc ogc;(Jcqqtct gcjtt tgct: 1 1 tqccr. c':.';r:c cgcggtgaag 60 

gcattacccg gcgggaatgc ttcagcggcg accgtgatgc ggttrccvcqt caggctactg 120 

35 cgtatgcatt gcagaccttg togcaoc^iat ttctacarjnn cac;.: ; :^ uac tgtatgagca 180 

tacagtatao ttgcttcaac agaacci ; .-i tt gactatcccjg L.j: : ccgg catgacagga 240 

gtaaaaatgg ctattgacga aaaca^Jcicag aaagcgttgg eg;:;:.: ;cact gggccagatt 300 

gagaaacaat ttggtaaagg ctccat:CrJtg cgcctgggtg as:>- :-:::ttc catggatgtg 360 

gaaaccatct ctaccggttc gctttcactg gatatcgcgc ti: :n(jcagg tggtctgccg 420 

atgggccgta tcgtcgaaat ctacggaccg gaatcttccg guaar-]accac gctgacgctg 480 

caggtgatcg ccgcagcgca gcgtgaaggt aaaacctgtg cgtttatcga tgctgaacac 540 

gcgctggacc caatctacgc acgtaaactg ggcgtco,! t.a tcc^Krancct gctgtgctcc 600 

cagccggaca ccggcgagca ggcacuggaa atctgtigacg ccctcgcgcg ttctggcgca 660 

gtagacgtta tcgtcgttga ctccgtggcg gcactnacgc cgaangcgga aatcgaaggc 720 

gaaatcggcg actctcacat gtj'jc-: ' gcg gcacc'. .i tgn t. : > .r.rjygc gatgcgtaag 780 

45 ctggcgggta (jcctg.iagca gtccaacacg ctgcr.jjatct tcai:c;3»jcca gatccgtatg 84 0 

aaaattggtg tgatgttcgg taacccggaa accrictaccg gtggtaacgc gctgaaattc 900 

tacgcctctg ttcgtctcga ca r; ccqtcgt atcgccqccq tcaatT-aggg cgaaaacgtg 960 

gtgggtagcg aaacccgcgt fj.i...Kj:::jfjtg aagaat:a»w;n t;:^ . -:cc gtttaaacag 1020 

gctgaattcc agatcctcta c(r]C(y\-^qgt atcnacctot a-- , - ggttgacctg 1080 

ggcgtaaaag agaagctgat c:!P;]a.- -i'lca ggcnc:;:: t.;; : n- :aa aggtgagaag 1140 

atcggtcagg gtaaagcgaa t-j'j;j'? ' ' cc t ggctg^i.-j : <r. . vjga aaccgcgaaa 1200 

gagatcgaga agaaagtacg tgagt i^tg ctgagcancc c; -oac gccggatttc 1260 

tctgtagatg atagcgaagg c<jtn^jc.^'7aa ac:i:aacn;ir-r; a- . itc gtcttgtttg 1320 

atacacaagg ytcgcatctg c-j; 1. 1; t gcttttv.;;:! o ^gga tatgccatga 1380 

cagaatcaac atcccgtc 1398 



55 
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<211> 358 
<212> PRT 

<213> Artificial Scqueru:e 



<220> 

<223> Description of Ar i f i.cial Soqij{:n(:!: : cr:.".s;vnyus 
e . coli sequence 

<400> 15 

Met Thr Gly Val Lys Met Ala He Asp C:^1.ij Asn l,y.s Gin Lys Ala Leu 
1 5 .10 15' 

Ala Ala Ala Leu Gly Gin Mo Glu Ly.-s C,}u Phe Gly Lys Gly Ser He 
20 ]/j 30 

Met Arg Leu Gly Glu Aj:;p Arq Ser Mot Asp Val G ! u Thr He Ser Thr 
35 40 '15 

Gly Ser Leu Ser Leu Asp T.lc Ala Lou Gly Ala r:;y Gly Leu Pro Met 

50 :>5 g:; 

Gly Arg He Val Glu l.lo Ty:: Gly Pro j ru-- : Gly Lys Thr Thr 
65 70 7:; 80 

Leu Thr Leu Gin Val He Ala Ala Ala Gin Arq G- u Gly Lys Thr Cys 
85 95 

Ala Phe He Asp Ala Glu ILiji Ala Leu A.v;p Pro He Tyr Ala Arg Lys 

100 ir^: 110 

Leu Gly Val Asp He Asp Asn Leu Lcj Gys Ser Gin Pro Asp Thr Gly 

H. 5 120 125 

Glu Gin Ala Leu Glu He Cys Asp Ala Leu Ala Arc Ser Gly Ala Val 

13C :;f; - ■ 

Asp Val He Val Val Asp Sor Val Ala Ala Lov Thr Pro Lys Ala Glu 
145 15^^ 160 

He Glu Gly Glu He Gly Asp Ser His Mr>t Gly - r-; Ala Ala Arg Met 
165 175 

Met Ser Gin Ala Met Arg 1-ys Leu Ala G.i y Asn !/.:;; !.ys Gin Ser Asn 
inO 1P'' 190 

Thr Leu Leu He Phe He Asn Gin i : Arq Met i.ys He Gly Val Met 

1 95 200 . ;'C5 

Phe Gly Asn Pro Glu Thr Thr Thr Gly Gly Asn A,ln Leu Lys Phe Tyr 

210 . ■ 2::^ 

Ala Ser Val Arg Leu Asp I l.r^ Arg Arg He Gly - Val Lys Glu Gly 
225 230 y.-"- 240 

Glu Asn Val Val Gly Ser Glu Thr Arg Val l,y : 1 Val Lys Asn Lys 
24 5 . .255 

He Ala Ala Pro Phe Lys G.1 :i Ala Glu P Gin : e Leu Tyr Gly Glu 

260 ?r;': 270 

Gly He Asn Phe Tyr Gly Glu Leu V- i Ar;p Let^ Gly v.il Lys Glu Lys 
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275 280 :'^85 

Leu lie Glu Lys Ala Gly Ala Trp Tyr Scr Tvr L^s Gly Glu Lys lie 
5 290 295 3: 

Gly Gin Gly Lys Ala Asn Ala Thr Ala 'i'rp l,y:] i\3p Asn Pro Glu 

305 310 J ■ 320 

10 Thr Ala Lys Glu He Glu Lys Lys Val Arg Glu Lc;u Leu Leu Ser Asn 

325 330 335 

Pro Asn Ser Thr Pro Asp Phe Sr^r Val A'^p A-^p S^vr G] u Gly Val Ala 
340 3^15 350 

15 

Glu Thr Asn Glu Asp Phe 
355 



20 

Claims 

1. A method of evolving a ceil to acquire a ricsirod property, ccrri-risinp: 

25 (i.) forming protoplasts of a population of different r.ni's; 

(ii.) fusing the protoplasts to form hybrid protoplasts, in which genomes from the protoplasts recombine to form 
hybrid genome?; 

(iii.) incubating ihe hybrid protoplasts under conditions prcMTioting roaencration of cells, thereby producing re- 
generated cells; 

30 (iv.) repeatedly forming protoplasts from the regenerated coHs, fusinr; the protoplasts to form hybrid protoplasts, 

in which genomes from the protoplasts rocombino to forrr. -s'tditiona' hybrid genomes; incubating the additional 
hybrid protoplasts under conditions nronoting regoncrai=<:^- of re:!"?, i^creby producing additional regenerated 
cells; and, 

(v.) selecting or screening to isolate regenerated colts i' lo;^- ■ ^ogenerated cells that have evolved toward 
35 acquisition of the desired property. 

2. The method of clairn 1, v/horoin n-o d "^^'-^-^ nropor;y i': sc'-^-'' ' ' : tolerance, ethanol production, ethanol 

tolerance, acid, improved production arui mnintananco ofcvy! )c co h improved production and maintanance 
of NAD(P)H, and improved glucose trp'^f-port, 

40 

3. The method of claim 1, further comprising ropeating sieps (i.)-fv.) vith reoonerated cells in step (iii.) or additional 
regenerated cells in step (iv.) being usc.^ to form the ; ■ ';c ;!ry ■ "'-'.) {^.^ t-ntii the regenerated cells have acquired 
the desired propc-ty. 

45 4, The method of ctnim 1, comprising stcn (iv), wherein '^Ino (ivO ' - aer'-^^i^of' prior to Step (v.). 

5. The method of claim 1, wherein the hybrid protoplasts corn:v': ? coi^r: h/iving more than two parental genomes. 

6. The method of claim 1, wherein the different cells arc lur-c;,;! c=;i!s, ar-i ;'u; regenerated cells are fungi mycelia. 

50 

7. The method of clnim 6, wherein n'"'!lor''i':''^ p.'-e nrox/ir^ecj t -riifi or spores with an enzyme. 

8. The method of clair-i 6, wherein the fur - ' roHs are (ron a fragile strain, lacking capacity for intact cell wall synthesis, 
whereby protoplast form spontaneously. 

55 

9. The method of c\r'i^ 6, further crmpri-ing trealinc !' m r -^'i - ^■.■•''^ nn ^' itor of cell wall formation to generate 
protoplasts. 
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10. The method of claim 1, further comprisinri soiecling or sr/crviJ -g to isoi:-io regenerated cells with hybrid genomes 
free from cells with parental genomes. 

11. The method of claim 1, wherein ;\ first sifl^ population r \ c&\s c ;'^t"='" n ^-^st marker and the second subpopulation 
5 of cells contain a second marker, and the method further compr!r~.ir;7 so'nn'ing or screening to identify regenerated 

cells expressing both the first and second m;.irker. 

12. The method of claim 1 , wherein the first m.-^f kc^r is a membrane marker y.-^.ii She second marker is a genetic marker. 

to 13, The method of claim 1, wherein the first marker is a first subunit of 3 heteromeric enzyme and the second marker 
is a second subunit of the heleromeric on7yrnc. 

14. The method of claim 1, further comprising trnnsform:",Q p^'' '^'^'''sts with a library of DNA fragments in at least one 
cycle. 

15 

15. The method of clnim 1^1, vvhr^^Qin 'he f^i'A Irngments are acC'-'-'-prininri hy a restriction enzyme. 

16. The method of clpim 1, further compri'^inr: nvposing th? r>rr,;-;p!-.cfn ' '-vyiolet irradiation in at least one cycle. 

20 17. The method of clai'"^ 1 , whcmin iho des^^ - ! :v oporly is ihe exprersio- ' ' ^ r rotein, primary metabolite, or secondary 
metabolite. 

18. The method of claim 1 , wherein the dcrircd property is the secretion a protein or secondary metabolite. 
25 19. The method of claim 18, wherein (he scrnnrinry metabolite is selected from taxol, cyclosporin A. and erythromycin. 

20. The method of claim 1, wherein the destTd prnpertv Is cp.pn':!'^' for meiopis. 

21. The method of claim 1, wherein the dervirf^ri nroperty is con-'p-Tihitity \o fnrm a heterokaryon with another strain. 

30 

22. The method of claim 1, further comprising exposing the protop^nsts or n ycelia to a mutagenic agent in at least one 
cycle. 

23. A method of prodi:ning a lih-i-/ ^livn^.-,- -njiiirr^liiilrir orgRnsi'vs, ' "-^od comprising; 

35 

providing a poo! of male gametes anci :} pool of female np-^^eler, • -"oin at least one of the male pool or the 
female pool cmr-pri'- 'i- pkr-'it - i^i'oront nnmetes ci"-'^'^": ■ " ' '''orcnt strains of a species or different 
species, wherein the male gametes lertili^e the female ga-nelj^ ^ 

permitting at least a portion of ih^ -"o^tii'^iQ fcriiii/ed gam'-^'^^s i--. ttov.- ir^.to reproductively viable organisms; 
^0 repeatedly crossing the reproductively viable organisms to produce a library of diverse organisms; and, 

selecting the library for a desired tr?'! or nmpcrty. 

24. The method of claim 23 wherein the lihrnry n' diverse org^p^^•"t^c: comprise a plurality of plants. 

45 25. The method of cic'm 24 wherein ihc rUi.^.'.s are selecte d i-^ - r >^ - -.-^e, Fctucoideae, Poacoideae Agrostis, 
Phleum, Dactylis, Sorgum, Seirfnr^, Z'^n. Oryzrt, Triiicum, S-- ■■^lo. ■ Hordeum, Saccharum, Poa, Festuca, 
Stenotaphrum, Cynodon, Coix, Otyrcoc. Phnrcae, Ccmposiiaa, a---^ : :minosae. 

26. The method of claim 24 wherein w-.o plonis -h o selected from cf^'-n), ri , - cat, rye, oats, bariey, pea, beans, lentil, 
50 peanut, yam bean, cnwnon^. v/ni. nt h- i i" ;:oybe;in. c!ovr'^ - 'far.-, s --e vetch, lotus, sweet clover, wisteria, 

sweetpea, sorghu-. "-i^'rv, r -'v:!-.' o-. ; ■\rn\s 

27. The method of claim 23 wherein tho i'^" of diverse organis- cr * -n^i'-e n plurality of animals. 
55 28. The method of claim 27 wherein the ani^v^'s nre sefected from non-human mammals and fish. 

29. The library produced by the method of r'-^im 2?^. 
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30. The method of claim 23 further comprisinQ: 

crossing a plurality of selectcfi librcry niombors by pooling namc!-^s f'crn the selected members and repeatedly 
crossing any r'-^vaiStinr^ addiiioi^nl rorifviuctivoly viable orp.'v-isrr--- r-oducc a second library of diverse organ- 
5 isms; and, 

selecting the second library tor a do sired trait or property. 

31. The second library made by the method o( claim 30. 

10 32. A method for whole genome shuffling through organized hciorodupicx shuffling, the method comprising: 

(a) , providing chromosomal DNA of an organism which is tarnotcd for shuffling, digesting the chromosomal 
DNA with one or more restriciJon enzymes, tigatir tho r'-o- :-v r^viA into a cosmid, thecosmid comprising 
at least two rare restriction enzyme recognition sites, -.li^^iioi'-r;, p. ! ifying, and storing sufficient cosmids to 

15 represent a comp!o!e chr"'mr!'^,or'"e; 

(b) . mutageni^ing aliquots of Ihc library in vitro using a mutage-.; 

(c) . transfecti^^T ';.'vnp|e f^ '-'-^i a pit^ niity of the n":':taQon!;--~id a'i -"t^is 'nio a population of target celts; 

(d) . assaying :'-;.^p tr<'v".".'*^-*:taf^'r ''^r phono! y pic imprr^-- 

(e) . growing transiorj.od coils iv.K'rrv ing ;^ mutant library cv: c i - < ' r cosm!d(s) on media and screening the 
20 resulting cell coior-os for Indopoi'^ ' v-M ir.utants conferring an dr/^ired nhenotype; 

(f) . isolating and pooling DNA fro"n r.olis identified in the snreeni'-n; 

(g) . dividing the selected pooi^ ar.fl dig-^Ming a! least one s;^mple with a rare-cutting restriction enzyme, pooling 
the cleaved samples, denaturing the samples, rcannealinn ihe samnies and religating the samples; and, 

(h) , transfecting target ce!ls with the resulling hcterodupir - o': p- p: -pagating the cells to allow recombination 
25 to occur between the strands of the hcteroduplcxes in vivr). 

33. The method of claim 3?. further comprising additioc-^My ^rr^^nr'^^^ ♦-^^r 'er^tants. 

34. The method of cln'm r^? rusher c^mrnsing further shuffling ihe 't(v f p ; .Vcxcs by recursive in vitro heteroduplex 

30 formation and in r-^r^-^-mbi'ir^ticn pi ir-- to additionally scree-- • p I' - '^sf octants. 

35. The method of clai-"- , • irther - prising performir^g an ad - -'^n? ■ '--genesis step to increase diversity during 
the shuffling proces- . 

35 36. The method of cia*'^ furt' c^ r-r^ '-^t ' ^g comb-'^'ng one r-- r.-.-.^r ' - t::roduptexes into a host chromosome by 
chromosome intcg-^tinn. 

37. The method of claim 36, further comprising rn pe a ti eg stops (a ' ■ '^")., uri-^"' ■'^e organism resulting from chromosome 
integration as the source for chromosomal [.")NA in step (a). 

40 

38. The method of claim 3'^. wherein the co'^mid comp''5'^.es m^-nr*''-'^ ^-.fr cr Not!. 

39. The method of claim ?:^, >A/he"eif" the transfectants arc as:;nv^^ r.' - - - • '^nn each mutagenized aliquot. 

45 40. The method of cla^- 3?, wherein a posilive assay resu'i indic-vcs i; ' ' rmid from a particular aliquot can confer 
phenotypic impro\' - and •.: *n\'i;p,s ir.' ge genomir i agm^^ ' ' !h:- - .. !':-i!:iie targets for heteroduplex mediated 
shuffling. 



50 



55 



41. The method of cla:-"' whc'eir^ "-r^ ■r-^pen is a cbemica! r. -vp -.. 

42. The method of claim 32, wherein growieg ir. -nsfectcd cells ha* '^'-rinc a m lant library of the identified cosmid(s) on 

media comprises p'aling the iran-lec:-^'! rr^'-^^ on sr^!'d media. 
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A GAGC CC AGAGAAGCC TGTCCGCACGCT 

^ , — J. 1 , ; 1 I 

1 0 20 t;.' *0 ''O &0 70 

H«ir HLnsh«ll GGGATTTTCGTCATCAGATTATCARAAAGCGCCCGCGGCCTAACAfTCCrAGRCAAGCCTCTCGGCACCCT 70 

H«w CloftC 2 TCTTGGCACGGT 12 

N.i* CLon. 4 AG^^."::-GAGAACCCTCTCGGCACGGT 28 

H«t« Clort« S -CCGCAGGGT 9 

Hew Clone 6 G s f^ '^" " ;< r.AGAAGCCAGTTCGCACGGT 28 

cof»piek« 13 C -r.r.r i;;agaaGCCTCTCCGCTTCGT 27 



TTTTGCCACTGCC " --.Zncirr •■ CnCT. 



— I 

110 



-T • — r 

no 90 100 

New Hinshell CTGGTTTGCt TTTGCC^CTr.CCCGCGGTGA AGGCATTACCCGr.^r: ; " • 

New Clone 2 CTGGC T r r.c r t T TG r c i^CTGCCCCCGGTGP aggCA TTACCC';r,r - ■;. 

New Clone « CTGGT T r C T t r Gr- ivc TGCCCGCGGTGA acCCATTACTCGf.C . - ' 

New Cione i CTGGT t r t; ', r t Tl G-c nc T r.ccCGCGGTGA a CGCAt taTCCOGC^; : 

New Clone 6 CTGGTTTOr vTTTr,", ACTGCCCCGGGTGn r.GCCATTACCCGGCr.t.f,; ' • 

COApUce 13 CTCGTTTGCTTTT^fC ATTGCCCCCGGTGAAGGCATTACCCGCCGGv'ii A T 



TGCTTCAGCGCCGACCGTGAT 

n 130 140 

'^CTTCAGCCCCCACCCTCAT 139 
CTTCAACGGCGACCGTGAT 82 



CTTCAGTGGCGACCGT6AT 
CTTCAGCGGCGCCCGTGAT 
CTTCAGCGGCGACCGTGAt 98 
.CTTCAGCCGCGACCGTCAT 97 



N«w HLnshAll 
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cosplete ly 



ccf'.r.TncGTr.r. : 

GCnGTGC^*TCG ■ 
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CTACTGC.-.T irCC 
iCTACTGCGTATCC; 



TGC 



180 

; Tcc ' :;^CCTTm ' 

TTG^^^..^CCTTG■r ■ 
■; TCCAGACCTTGTC 



rC:;^n^,CTACTGCGTATGC ACTGCAGACCTTnTGGC 
: f: Ar,CrTr,i;TCTGTATGCAC Tt;CAGACCTTGTCr 



.ATTTCTACA^AACACCTGAJ 

200 :io 

ATTTCTACAAAACACTTGAT 209 
TTTCTACGAAACACCTGAT 152 
' -TTTCTACAAAACACCTCAT 166 
■ '■-Cn ATTTCTACAAAACACCTGAT 149 
^ - C ^ fiTTTCTACAAAACACCTCTT 16B 
^.ATTTCTACAAAACACTCGAT 167 
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New Clone 6 
conplect 13 



At: 
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A" 

ACTG 



r T ' T" '.CAC; - r at c i 
■ A A r T r. r re a c ac./ aca T a \ ■. 

•R'l OAr.CAT ^,■ G r A T A ATT GC T TC G A C A G A AC AT AT T G r- C : •' 



iC^T;^TTflP?cGqpATqftCft^ 

270 2B0 



r^tattaccccgcatgacag 279 
:tattaccccgcatgacag 222 
.-.tattacccggcatcacag 238 
: ;gtattacccggcatgacag 219 

actctatgar.catgr ^ctataattgcttcaacrgaacatattgact.'^ rccggtattacccggcatgacag 238 
accgtatg^gcacacactataatcgcttcgacagaacttattcac tatccGCTATTACCCGCCATGACAG 237 



C'r- ■ :. r- : ^ ■ ■ ■ rc^r ' . l.--; f.T ' ^ rCGGCCAGATTCAGAAACA 

— ... ^ , f 

?■ VO J:»0 340 350 

New Nlnsna 11 G;, - '.G.-.':ga;.,>. , f;/i ^ r - . gtT' ■ TCGGCCAGATTGAGAAACA 349 

Hew Clone 2 C --. - * . / t <: GA;. n : • Zr-r > : :GTTG TGGGCCAGATTGAGAAACA 292 

New Clone .4 Ga ; i a ac* v ..'.- V i ;i : CC;;iCGAAA ac a aaC -■GAAAtCGTTA^rr;;:. . - .-C TGGGCCAGATTGAGAAACA 308 

New Clone 6 ga^ :^ Afi 1^ AT'.^c r ■ : cgacgagaacaaac r»c;AAAGCGTTCr;:n';' , r :;tggCCCAGATTGAGAAACA 289 

Hew Cl on* 6 C ' • ' ? R A • ' ; ■ ' ' r T - ■ r f ; ^ G a r, ' r. r t r, c - G n a a r C GTTG * r; ' - 7 GGGCC ACATTG AGAAAC A 308 

coup le I e 13 G a ^ : ■ - a ; ■. .-^ i (; go T ■ ' r r; ^ . g f ' ;* ^ • ' '■ ' g r. r. a - r g t t g ■ " G c t GGGCCAGATTGAGAAAC A 307 



New Hi ni ha 1 1 A 7 r G c- • ^ ,: - r Cat g :; 'o '" r " : ^ g a a r. cc c t r g " ■ * 
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Hew Clone 6 ATTTGGT a r. ^GCC TGG ATCATGCCGCTGGG ' GAAGACCGTTCCA : 

coaplete 13 Ct- -ggt ; CGG : " ATCArGCG-r t ^g .r Gaag-'- CCGt^gc --t-: 



410 430 

: (^.GAAACCATCTCTACCGGT 119 
^GGAAACCATCTCTACCCGT 362 
V V gtGGAAACCATCTCCACCGCT 378 
Gf T G TGGAAACCATCTCTACCGGT 359 
GA vcTGGAAACCATCTCTACTGGT 378 
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